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HAEMOPHILUS ADHERENCE AND PENETRATION PROTEINS 

FIELD OF THE INVENTION 

The invention relates to Haemophilus adhesion and 
penetration proteins, nucleic acids, and vaccines. 

5 BACKGROUND OF THE INVENTION 

Most bacterial diseases begin with colonization of a 
particular mucosal surface (Beachey et al., 1981, J. 
Infect. Dis. 143:325-345). Successful colonization 
requires that an organism overcome mechanical cleansing 

10 of the mucosal surface and evade the local immune 

response. The process of colonization is dependent upon 
specialized microbial factors that promote binding to 
host cells (Hultgren et al . , 1993 Cell, 73:887-901). 
In some cases the colonizing organism will subsequently 

15 enter (invade) these cells and survive intracellularly 

(Falkow, 1991, Cell 65:1099-1102). 

Haemophilus influenzae is a common commensal organism 
of the human respiratory tract (Kuklinska and Kilian, 
1984, Eur. J. Clin. Microbiol. 3:249-252). It is a 

20 human- specif ic organism that normally resides in the 

human nasopharynx and must colonize this site in order 
to avoid extinction. This microbe has a number of 
surface structures capable of promoting attachment to 
host cells (Guerina et al . , 1982, J. Infect. Dis. 

25 146:564; Pichichero et al . , 1982, Lancet ii: 960-962; St. 

Geme et al., 1993, Proc. Natl. Acad. Sci. U.S.A. 
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90:2875-2879). In addition, tf. influenzae has acquired 
the capacity to enter and survive within these cells 
( Forsgren et al . , 1994, Infect. Immun. 62:673-679; St. 
Geme and Falkow, 1990, Infect. Immun. 58:4036-4044; St. 
5 Geme and Falkow, 1991, Infect. Immun. 59:1325-1333, 

Infect. Immun. 59:3366-3371). As a result, this 
bacterium is an important cause of both localized 
respiratory tract and systemic disease (Turk, 1984, J. 
Med. Microbiol. 18:1-16). Nonencapsulated, non-typable 

10 strains account for the majority of local disease (Turk, 

1984, supra); in contrast, serotype b strains, which 
express a capsule composed of a polymer of ribose and 
ribitol-5-phosphate (PRP) , are responsible for over 95% 
of cases of H. influenzae systemic disease (Turk, 1982, 

15 Clinical importance of Haemophilus influenzae, p. 3-9. 

In S.H. Sell and P.F. Wright (ed.), Haemophilus 
influenzae epidemiology, immunology, and prevention of 
disease. Elsevier/North-Holland Publishing Co., New 
York) . 

20 The initial step in the pathogenesis of disease due to 

H. influenzae involves colonization of the upper 
respiratory mucosa (Murphy et al . , 1987, J. Infect. Dis. 
5:723-731). Colonization with a particular strain may 
persist for weeks to months, and most individuals remain 

25 asymptomatic throughout this period (Spinola et al . , 

1986, I. Infect. Dis. 154:100-109). However, in certain 
circumstances colonization will be followed by 
contiguous spread within the respiratory tract, 
resulting in local disease in the middle ear, the 

30 sinuses, the conjunctiva, or the lungs. Alternatively, 

on occasion bacteria will penetrate the nasopharyngeal 
epithelial barrier and enter the bloodstream. 
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In vitro observations and animal studies suggest that 
bacterial surface appendages called pili (or fimbriae) 
play an important role in H. Influenzae colonization. 
In 1982 two groups reported a correlation between 
5 piliation and increased attachment to human 

oropharyngeal epithelial cells and erythrocytes (Guerina 
et al., supra; Pichichero et al . , supra). Other 
investigators have demonstrated that anti-pilus 
antibodies block in vitro attachment by piliated H. 

10 influenzae (Forney et al . , 1992, J. Infect. Dis . 

165:464-470; van Alphen et al . , 1988, Infect. Immun . 
56:1800-1806). Recently Weber et al . insertionally 
inactivated the pilus structural gene in an tf. 
influenzae type b strain and thereby eliminated 

15 expression of pili; the resulting mutant exhibited a 

reduced capacity for colonization of year-old monkeys 
(Weber et al . , 1991, Infect. Immun. 59:4724-4728). 

A number of reports suggest that nonpilus factors also 
facilitate Haemophilus colonization. Using the human 

20 nasopharyngeal organ culture model, Farley et al . (1986, 

J. Infect. Dis. 161:274-280) and Loeb et al . (1988, 
Infect. Immun. 49:484-489) noted that nonpiliated type 
b strains were capable of mucosal attachment. Read and 
coworkers made similar observations upon examining 

25 nontypable strains in a model that employs nasal 

turbinate tissue in organ culture (1991, J. Infect. Dis. 
163 : 549-558) . In the monkey colonization study by Weber 
et al. (1991, supra), nonpiliated organisms retained a 
capacity for colonization, though at reduced densities ; 

30 moreover, among monkeys originally infected with the 

piliated strain, virtually all organisms recovered from 
the nasopharynx were nonpiliated. All of these 
observations are consistent with the finding that 
nasopharyngeal isolates from children colonized with H. 
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influenzae are frequently nonpiliated (Mason et al . , 
1985, Infect. Immun. 49:98-103; Brinton et al., 1989, 
Pediatr. Infect. Dis. J. 8:554-561). 

Previous studies have shown that H. influenzae are 
capable of entering (invading) cultured human epithelial 
cells via a pili -independent mechanism (St. Geme and 
Falkowi 1990, supra; St. Geme and Falkow, 1991, supra) . 
Although H. influenzae is not generally considered an 
intracellular parasite, a recent report suggests that 
these in vitro findings may have an in vivo correlate 
(Forsgren et al., 1994, supra) . Forsgren and coworkers 
examined adenoids from 10 children who had their 
adenoids removed because of longstanding secretory 
otitis media or adenoidal hypertrophy, in all 10 cases 
there were viable intracellular H. influenzae. Electron 
microscopy demonstrated that these organisms were 
concentrated in the reticular crypt epithelium and in 
macrophage- like cells in the subepithelial layer of 
tissue. One possibility is that bacterial entry into 
host cells provides a mechanism for evasion of the local 
immune response, thereby allowing persistence in the 
respiratory tract. 

Thus, a vaccine for the therapeutic and prophylactic 
treatment of Haemophilus infection is desirable. 
Accordingly, it is an object of the present invention 
to provide for recombinant Haemophilus Adherence and 
Penetration (HAP) proteins and variants thereof, and to 
produce useful quantities of these HAP proteins using 
recombinant DNA techniques. 

It is a further object of the invention to provide 
recombinant nucleic acids encoding HAP proteins, and 
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expression vectors and host cells containing the nucleic 
acid encoding the HAP protein. 

An additional object of the invention is to provide 
monoclonal antibodies for the diagnosis of Haemophilus 
infection. 

A further object of the invention is to provide methods 
for producing the HAP proteins, and a vaccine comprising 
the HAP proteins of the present invention. Methods for 
the therapeutic and prophylactic treatment of 
Haemophilus infection are also provided . 

SUMMARY OF THE INVENTION 

In accordance with the foregoing objects, the present 
invention provides recombinant HAP proteins, and 
isolated or recombinant nucleic acids which encode the 
HAP proteins of the present invention. Also provided 
are expression vectors which comprise DNA encoding a HAP 
protein operably linked to transcriptional and 
translational regulatory DNA, and host cells which 
contain the expression vectors. 

The invention provides also provides methods for 
producing HAP proteins which comprises culturing a host 
cell transformed with an expression vector and causing 
expression of the nucleic acid encoding the HAP protein 
to produce a recombinant HAP protein. 

The invention also includes vaccines for Haemophilus 
influenzae infection comprising an HAP protein for 
prophylactic or therapeutic use in generating an immune 
response in a patient. Methods of treating or 
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preventing Haemophilus influenzae infection comprise 
administering a vaccine . 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1A and IB depict light micrographs of H. 
influenzae strains DB117 (pGJB103) and DB117(pN187) 
incubated with Chang epithelial cells. Bacteria were 
incubated with an epithelial monolayer for 30 minutes 
before rinsing and straining with Giemsa stain. Figure 
1A: H. influenzae strain DB117 carrying cloning vector 
alone (pGJB103) ; Figure IB: H. influenzae strain DB117 
harboring recombinant plasmid pH187 . Bar represents 3 . 5 

Figures 2A, 2B, 2C and 2D depict thin section 
transmission electron micrographs demonstrating 
interaction between H. influenzae strains N187 and 
DB117 (pN187) with Chang epithelial cells. Bacteria were 
incubated with epithelial monolayers for four hours 
before rinsing and processing for examination by 
transmission electron microscopy. Figure 2A: strain 
N187 associated with the epithelial cell surface and 
present in an intracellular location; Figure 2B: H. 
influenzae DB117 (pH187) in intimate contact with the 
epithelial cell surface; Figure 2C: strain DB117(pN187) 
in the process of entering an epithelial cell; Figure 
2D: strain DB117(pN187) present in an intracellular 
location. Bar represents 1 /im. 

Figure 3 depicts outer membrane protein profiles of 
various strains. Outer membrane proteins were isolated 
on the basis of sarcosyl insolubility and resolved on 
a 10% SDS-polyacrylamide gel. Proteins were visualized 
by staining with Coomassie blue. Lane 1, H. influenzae 
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strain DB117 (pGJBl03) ; lane 2, strain DB117 (pN187) ; lane 
3, strain DB117 (pJS106) ; lane 4, E. coli HB101 (pGJB103) ; 
lane 5, HB101 (pN187) . Note novel proteins at -160 kD 
and 45 kD marked by asterisks in lanes 2 and 3. 

5 Figure 4 depicts a restriction map of pN187 and 

derivatives and locations of mini-TnlG Jean insertions. 
pN187 is a derivative of pGJB103 that contains an 8.5-kb 
Sau3AI fragment of chromosomal DNA from H. influenzae 
strain N187. Vector sequences are represented by 

10 hatched boxes. Letters above top horizontal line 

indicate restriction enzyme sites: Bg, Bglll; C, Clal; 
E, EcoRI ; P, Pstl. Numbers and lollipops above top 
horizontal line show positions of mini-Tn20 kasi 
insertions; open lollipops represent insertions that 

15 have no effect on adherence and invasion, while closed 

lollipops indicate insertions that eliminate the 
capacity of pN187 to promote association with epithelial 
monolayers. Heavy horizontal line with arrow represents 
location of hap locus within pN187 and direction of 

20 transcription. ( + ) : recombinant plasmids that promote 

adherence and invasion; (-) : recombinant plasmids that 
fail to promote adherence and invasion. 

Figure 5 depicts the identification of plasmid-encoded 
proteins using the bacteriophage T7 expression system. 

25 Bacteria were radiolabeled with [ 35 S] methionine, and 

whole cell lysates were resolved on a 10% SDS- 
polyacryl amide gel. Proteins were visualized by 
autoradiography. Lane 1, E. coli XL-1 Blue(pT7-7) 
uninduced; lane 2, XL-1 Blue(pT7-7) induced with IPTG; 

30 lane 3, XL-1 Blue(pJS103) uninduced; lane 4, XL-1 

Blue(pJS103) induced with IPTG; lane 5, XL-1 
Blue(pJS104) uninduced; lane 6, XL-1 Blue(pJS104) 
induced with IPTG. The plasmids pJS103 and pJS104 are 
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derivatives of pT7-7 that contain the 6.5-kb PstI 
fragment from pNl87 in opposite orientations. Asterisk 
indicates overexpressed protein in XL-1 Blue (pJS104) . 

Figures 6A, 6B, and 6C depict the nucleotide sequence 
and predicted amino acid sequence of hap gene . Putative 
-10 and -35 sequences 5' to the hap coding sequence are 
underlined; a putative rho- independent terminator 3' to 
the hap stop codon is indicated with inverted arrows. 
The first 25 amino acids of the protein, which are 
boxed, represent the signal sequence. 

Figures 7A, 7B, 7C, 7D, 7E, 7F, 7G, and 7H depict a 
sequence comparison of the hap product and the cloned 
H. influenzae IgAl proteases. Amino acid homologies 
between the deduced hap gene product and the iga gene 
products from H . influenzae HK368, HK61, HK393, and 
HK793 are shown. Dashes indicate gaps introduced in the 
sequences in order to obtain maximal homology. a 
consensus sequence for the five proteins is shown on the 
lower line. The conserved serine-type protease 
catalytic domain is underlined, and the common active 
site serine is denoted by an asterisk. The conserved 
cysteines are also indicated by asterisks. 

Figure 8 depicts the IgAl protease activity assay. 
Culture supernatants were assayed for the ability to 
cleave IgAl. Reaction mixtures were resolved on a 10% 
SDS-polyacrylamide gel and then transferred to a 
nitrocellulose membrane. The membrane was probed with 
antibody against human IgAl heavy chain. Lane 1, H. 
influenzae strain N187; lane 2, strain DB117 (pGJB103) ; 
lane 3, strain DB117 (pN187) . The cleavage product 
patterns suggest that strain N187 contains a type 2 IgAl 
protease while strains DB117 (pGJB103) and DB117(pN187) 
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contain a type 1 enzyme. The upper band of ~70-kD seen 
with the DB117 derivatives represents intact IgAl heavy 
chain. 

Figures 9A and 9B depict southern analysis of 
chromosomal DNA from strain tf. influenzae N187, probing 
with hap versus iga. DNA fragments were separated on 
a 0.7% agarose gel and transferred bidirectionally to 
nitrocellulose membranes prior to probing with either 
hap or iga. Lane 1, N187 chromosomal DNA digested with 
EcoRI; lane 2, N187 chromosomal DNA digested with Bglll; 
lane 3, N187 chromosomal DNA digested with Ba/riHI ; lane 
4, the 4.8-kb Clal-PstI fragment from pN187 that 
contains the intact hap gene. Figure 9A: Hybridization 
with the 4.8-kb Clal-PstI fragment containing the hap 
gene; Figure 9B: hybridization with the iga gene from 
tf. influenzae strain Rd, carried as a 4.8-kb Clal-FcoRI 
fragment in pVD116 . 

Figure 10 depicts a SDS-polyacryl amide gel of secreted 
proteins. Bacteria were grown to late log phase, and 
culture supernatants were precipitated with 
trichloroacetic acid and then resolved on a 10% SDS- 
polyacrylamide gel. Proteins were visualized by 
staining with Coomassie blue. Lane 1, H. influenzae 
strain DB117 (pGJB103) ; lane 2, DB117 (pN187) ; lane 3, 
DB117(pJS106) ; lane 4, DB117 (pJS102 ) ; lane 5, 
DB117(pJS105) ; lane 6, DB117 (Tnl0-18) ; lane 7, 
DB117(TnlO-4' ) ; lane 8, DB117 (TnlO-30) ; lane 9, 
DB117 (TnlO-16) ; lane 10, DB117 (Tnl0-10) ; lane II, 
DB117(TnlO-8) ; lane 12, N187. Asterisk indicates 110-kD 
secreted protein encoded by hap. 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides novel Haemophilus 
Adhesion and Penetration (HAP) proteins, in a preferred 
embodiment, the HAP proteins are from Haemophilus 
strains, and in the preferred embodiment, from 
Haemophilus influenza. However, using the techniques 
outlined below, HAP proteins from other Haemophilus 
influenzae strains, or from other bacterial species such 
as Neisseria spp. or Bordetalla spp. may also be 
obtained. 

A HAP protein may be identified in several ways. A HAP 
nucleic acid or HAP protein is initially identified by 
substantial nucleic acid and/or amino acid sequence 
homology to the sequences shown in Figure 6. Such 
homology can be based upon the overall nucleic acid or 
amino acid sequence . 

The HAP proteins of the present invention have limited 
homology to Haemophilus influenzae and N. gonorrhoeae 
serine-type IgAl proteases. This homology, shown in 
Figure 7, is approximately 30-35% at the amino acid 
level, with several stretches showing 55-60% identity, 
including amino acids 457-549, 399-466, 572-622, and 
233-261. However, the homology between the HAP protein 
and the IgAl protease is considerably lower than the 
similarity among the IgAl proteases themselves. 

In addition, the full length HAP protein has homology 
to Tsh, a hemagglutinin expressed by an avian E. coli 
strain (Provence and Curtiss 1994, Infect. Immun. 
62:1369-1380). The homology is greatest in the N- 
terminal half of the proteins, and the overall homology 
is 30.5% homologous. The full length HAP protein also 



has homology with pertactin, a 69 kD outer membrane 
protein expressed by B. pertussis, with the middle 
portion of the proteins showing 39% homology. Finally, 
HAP has 34 - 52% homology with six regions of HpmA, a 
calcium- independent hemolysin expressed by Proteus 
mirabilis (Uphoff and Welch, 1990, J. Bacteriol . 
172:1206-1216) . 

As used herein, a protein is a "HAP protein" if the 
overall homology of the protein sequence to the amino 
acid sequence shown in Figure 6 is preferably greater 
than about 40 - 50%, more preferably greater than about 
60% and most preferably greater than 80%. In some 
embodiments the homology will be as high as about 90 to 
95 or 98%. This homology will be determined using 
standard techniques known in the art, such as the Best 
Fit sequence program described by Devereux et al . , Nucl. 
Acid Res. 12:387-395 (1984) . The alignment may include 
the introduction of gaps in the sequences to be aligned. 
In addition, for sequences which contain either more or 
fewer amino acids than the protein shown in Figure 6, 
it is understood that the percentage of homology will 
be determined based on the number of homologous amino 
acids in relation to the total number of amino acids. 
Thus, for example, homology of sequences shorter than 
that shown in Figure 6, as discussed below, will be 
determined using the number of amino acids in the 
shorter sequence. 

HAP proteins of the present invention may be shorter 
than the amino acid sequence shown in Figure 6 . As 
shown in the Examples, the HAP protein may undergo post- 
translational processing similar to that seen for the 
serine -type IgAl proteases expressed by Haemophilus 
influenzae and N. gonorrhoeae. These proteases are 
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synthesized as preproteins with three functional 
domains: the N- terminal signal peptide, the protease, 
and a C- terminal helper domain. Following movement of 
these proteins into the periplasmic space, the carboxy 
terminal S-domain of the proenzyme is inserted into the 
outer membrane, possibly forming a pore (Poulsen et al . , 
1989, Infect- Immun. 57:3097-3105; Pohlner et aJ . , 1987, 
Nature (London). 325:458-462; Klauser et al . , 1992, 
EMBO J. 11:2327-2335; Klauser et al . , 1993, J. Mol . 
Biol. 234:579-593). Subsequently the amino end of the 
protein is exported through the outer membrane, and 
autoproteolytic cleavage occurs to result in secretion 
of the mature 100 to 106-kD protease. The 45 to 56-kD 
C-terminal fi-domain remains associated with the outer 
membrane following the cleavage event. As shown in the 
Examples, the HAP nucleic acid is associated with 
expression of a 16 0 kD outer membrane protein. The 
secreted gene product is an approximately 110 kD 
protein, with the simultaneous appearance of a 45 kD 
outer membrane protein. The 45 kD protein appears to 
correspond to amino acids from about 960 to about 13 94 
of Figure 6. Any one of these proteins is considered 
a HAP protein for the purposes of this invention. 

Thus, in a preferred embodiment, included within the 
def intion of HAP proteins are portions or fragments of 
the sequence shown in Figure 6. The fragments may be 
fragments of the entire sequence, the 110 kD sequence, 
or the 45 kD sequence. Generally, the HAP protein 
fragments may range in size from about 10 amino acids 
to about 1900 amino acids, with from about 50 to about 
1000 amino acids being preferred, and from about 100 to 
about 500 amino acids also preferred. Particularly 
preferred fragments are sequences unique to HAP; these 
sequences have particular use in cloning HAP proteins 
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from other organisms or to generate antibodies specific 
to HAP proteins. Unique sequences are easily identified 
by those skilled in the art after examination of the HAP 
protein sequence and comparison to other proteins; for 
example/ by examination of the sequence alignment shown 
in Figure 7. For instance, as compared to the IgA 
proteases, unique sequences include, but are not limited 
to, amino acids 11-14, 16-22, 108-120, 155-164, 257-265, 
281-288, 318-336, 345-353, 398-416, 684-693, 712-718, 
753-761, 871-913, 935-953, 985-1008, 1023-1034, 1067- 
1076, 1440-1048, 1585-1592, 1631-1639, 1637-1648, 1735- 
1743, 1863-1871, 1882-1891, 1929-1941, and 1958-1966 
(using the numbering of Figure 7) . HAP protein 
fragments which are included within the definition of 
a HAP protein include N- or C- terminal truncations and 
deletions which still allow the protein to be 
biologically active; for example, which still exhibit 
proteolytic activity in the case of the 110 kD putative 
protease sequence. In addition, when the HAP protein 
is to be used to generate antibodies, for example as a 
vaccine, the HAP protein must share at least one epitope 
or determinant with either the full length protein, the 
110 kD protein or the 45 kD protein, shown in Figure 6. 
In a preferred embodiment, the epitope is unique to the 
HAP protein; that is, antibodies generated to a unique 
epitope exhibit little or no cross-reactivity with other 
proteins. By "epitope" or "determinant" herein is meant 
a portion of a protein which will generate and/or bind 
an antibody- Thus, in most instances, antibodies made 
to a smaller HAP protein will be able to bind to the 
full length protein. 

In some embodiments, the fragment of the HAP protein 
used to generate antibodies are small; thus, they may 
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be used as haptens and coupled to protein carriers to 
generate antibodies, as is known in the art. 

Preferably, the antibodies are generated to a portion 
of the HAP protein which remains attached to the 
5 Haemophilus influenzae organism. For example, the HAP 

protein can be used to vaccinate a patient to produce 
antibodies which upon exposure to the Haemophilus 
influenzae organism (e.g. during a subsequent infection) 
bind to the organism and allow an immune response. 
10 Thus, in one embodiment, the antibodies are generated 

to the roughly 45 kD fragment of the full length HAP 
protein. Preferably, the antibodies are generated to 
the portion of the 45 kD fragment which is exposed at 
the outer membrane. 

15 In an alternative embodiment, the antibodies bind to the 

mature secreted 110 kD fragment. For example, as 
explained in detail below, the HAP proteins of the 
present invention may be administered therapeutically 
to generate neutralizing antibodies to the 110 kD 

20 putative protease, to decrease the undesirable effects 

of the 100 kD fragment. 

In the case of the nucleic acid, the overall homology 
of the nucleic acid sequence is commensurate with amino 
acid homology but takes into account the degeneracy in 

25 the genetic code and codon bias of different organisms. 

Accordingly, the nucleic acid sequence homology may be 
either lower or higher than that of the protein 
sequence. Thus the homology of the nucleic acid 
sequence as compared to the nucleic acid sequence of 

30 Figure 6 is preferably greater than 40%, more preferably 

greater than about 60% and most preferably greater than 
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80%. In some embodiments the homology will be as high 
as about 90 to 95 or 98%. 

In one embodiment, the nucleic acid homology is 
determined through hybridization studies. Thus, for 
example, nucleic acids which hybridize under high 
stringency to all or part of the nucleic acid sequence 
shown in Figure 6 are considered HAP protein genes. 
High stringency conditions include washes with 0.1XSSC 
at 6 5°C for 2 hours. 

The HAP proteins and nucleic acids of the present 
invention are preferably recombinant. As used herein, 
"nucleic acid" may refer to either DNA or RNA, or 
molecules which contain both deoxy- and ribonucleotides . 
The nucleic acids include genomic DNA, cDNA and 
oligonucleotides including sense and ant i- sense nucleic 
acids. Specifically included within the definition of 
nucleic acid are ant i -sense nucleic acids. An anti- 
sense nucleic acid will hybridize to the corresponding 
non-coding strand of the nucleic acid sequence shown in 
Figure 6, but may contain ribonucleotides as well as 
deoxyribonucleo tides . Generally, anti-sense nucleic 
acids function to prevent expression of mRNA, such that 
a HAP protein is not made, or made at reduced levels. 
The nucleic acid may be double stranded, single 
stranded, or contain portions of both double stranded 
or single stranded sequence. By the term "recombinant 
nucleic acid" herein is meant nucleic acid, originally 
formed in vitro by the manipulation of nucleic acid by 
endonucleases, in a form not normally found in nature. 
Thus an isolated HAP protein gene, in a linear form, or 
an expression vector formed in vitro by ligating DNA 
molecules that are not normally joined, are both 
considered recombinant for the purposes of this 
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invention. It is understood that once a recombinant 
nucleic acid is made and reintroduced into a host cell 
or organism, it will replicate non-recombinantly, i.e. 
using the in vivo cellular machinery of the host cell 
rather than in — vitro manipulations; however, such 
nucleic acids, once produced recombinant ly, although 
subsequently replicated non-recombinantly, are still 
considered recombinant for the purposes of the 
invention. 

Similarly, a "recombinant protein" is a protein made 
using recombinant techniques. i.e. through the 
expression of a recombinant nucleic acid as depicted 
above. A recombinant protein is distinguished from 
naturally occurring protein by at least one or more 
characteristics. For example, the protein may be 
isolated away from some or all of the proteins and 
compounds with which it is normally associated in its 
wild type host, or found in the absence of the host 
cells themselves. Thus, the protein may be partially 
or substantially purified. The definition includes the 
production of a HAP protein from one organism in a 
different organism or host cell. Alternatively, the 
protein may be made at a significantly higher 
concentration than is normally seen, through the use of 
a inducible promoter or high expression promoter, such 
that the protein is made at increased concentration 
levels. Alternatively, the protein may be in a form not 
normally found in nature, as in the addition of an 
epitope tag or amino acid substitutions, insertions and 
deletions. 

Also included with the definition of HAP protein are HAP 
proteins from other organisms, which are cloned and 
expressed as outlined below. 
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In the case of anti-sense nucleic acids, an anti-sense 
nucleic acid is defined as one which will hybridize to 
all or part of the corresponding non-coding sequence of 
the sequence shown in Figure 6. Generally, the 
hybridization conditions used for the determination of 
anti-sense hybridization will be high stringency 
conditions, such as 0.1XSSC at 65°C. 

Once the HAP protein nucleic acid is identified, it can 
be cloned and, if necessary, its constituent parts 
recombined to form the entire HAP protein nucleic acid. 
Once isolated from its natural source, e.g., contained 
within a plasmid or other vector or excised therefrom 
as a linear nucleic acid segment, the recombinant HAP 
protein nucleic acid can be further used as a probe to 
identify and isolate other HAP protein nucleic acids. 
It can also be used as a "precursor" nucleic acid to 
make modified or variant HAP protein nucleic acids and 
proteins . 

Using the nucleic acids of the present invention which 
encode HAP protein, a variety of expression vectors are 
made. The expression vectors may be either self- 
replicating extrachromosomal vectors or vectors which 
integrate into a host genome. Generally, these 
expression vectors include transcriptional and 
translational regulatory nucleic acid operably linked 
to the nucleic acid encoding the HAP protein. "Operably 
linked" in this context means that the transcriptional 
and translational regulatory DNA is positioned relative 
to the coding sequence of the HAP protein in such a 
manner that transcription is initiated. Generally, this 
will mean that the promoter and transcriptional 
initiation or start sequences are positioned 5' to the 
HAP protein coding region. The transcriptional and 



18- 



translational regulatory nucleic acid will generally be 
appropriate to the host cell used to express the HAP 
protein; for example, transcriptional and translational 
regulatory nucleic acid sequences from Bacillus will be 
used to express the HAP protein in Bacillus . Numerous 
types of appropriate expression vectors, and suitable 
regulatory sequences are known in the art for a variety 
of host cells. 

In general, the transcriptional and translational 
regulatory sequences may include, but are not limited 
to, promoter sequences, leader or signal sequences, 
ribosomal binding sites, transcriptional start and stop 
sequences, translational start and stop sequences, and 
enhancer or activator sequences. In a preferred 
embodiment, the regulatory sequences include a promoter 
and transcriptional start and stop sequences. 

Promoter sequences encode either constitutive or 
inducible promoters. The promoters may be either 
naturally occurring promoters or hybrid promoters. 
Hybrid promoters, which combine elements of more than 
one promoter, are also known in the art, and are useful 
in the present invention. 

In addition, the expression vector may comprise 
additional elements. For example, the expression vector 
may have two replication systems, thus allowing it to 
be maintained in two organisms, for example in mammalian 
or insect cells for expression and in a procaryotic host 
for cloning and amplification. Furthermore, for 
integrating expression vectors, the expression vector 
contains at least one sequence homologous to the host 
cell genome, and preferably two homologous sequences 
which flank the expression construct. The integrating 
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vector may be directed to a specific locus in the host 
cell by selecting the appropriate homologous sequence 
for inclusion in the vector. Constructs for integrating 
vectors are well known in the art . 

5 In addition, in a preferred embodiment, the expression 

vector contains a selectable marker gene to allow the 
selection of transformed host cells. Selection genes 
are well known in the art and will vary with the host 
cell used. 

10 The HAP proteins of the present invention are produced 

by culturing a host cell transformed with an expression 
vector containing nucleic acid encoding a HAP protein, 
under the appropriate conditions to induce or cause 
expression of the HAP protein. The conditions 

15 appropriate for HAP protein expression will vary with 

the choice of the expression vector and the host cell, 
and will be easily ascertained by one skilled in the art 
through routine experimentation. For example, the use 
of constitutive promoters in the expression vector will 

20 require optimizing the growth and proliferation of the 

host cell, while the use of an inducible promoter 
requires the appropriate growth conditions for 
induction. In addition, in some embodiments, the timing 
of the harvest is important. For example, the 

25 baculoviral systems used in insect cell expression are 

lytic viruses, and thus harvest time selection can be 
crucial for product yield. 

Appropriate host cells include yeast, bacteria, 
archebacteria, fungi, and insect and animal cells, 
30 including mammalian cells. Of particular interest are 

Drosophila melangaster cells, Saccharomvces cerevisiae 
and other yeasts, g t <?qI\, Bacillus subtilis. SF9 cells, 
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C129 cells, 293 cells, Neurospora, BHK, CHO, COS, and 
HeLa cells, immortalized mammalian myeloid and lymphoid 
cell lines. 

In a preferred embodiment, HAP proteins are expressed 
5 in bacterial systems. Bacterial expression systems are 

well known in the art. 

A suitable bacterial promoter is any nucleic acid 
sequence capable of binding bacterial RNA polymerase and 
initiating the downstream (3') transcription of the 

10 coding sequence of HAP protein into mRNA. A bacterial 

promoter has a transcription initiation region which is 
usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region 

typically includes an RNA polymerase binding site and 

15 a transcription initiation site. Sequences encoding 

metabolic pathway enzymes provide particularly useful 
promoter sequences . Examples include promoter sequences 
derived from sugar metabolizing enzymes, such as 
galactose, lactose and maltose, and sequences derived 

20 from biosynthetic enzymes such as tryptophan. Promoters 

from bacteriophage may also be used and are known in the 
art. In addition, synthetic promoters and hybrid 
promoters are also useful; for example, the tac promoter 
is a hybrid of the trp and lac promoter sequences. 

25 Furthermore, a bacterial promoter can include naturally 

occurring promoters of non-bacterial origin that have 
the ability to bind bacterial RNA polymerase and 
initiate transcription . 

30 In addition to a functioning promoter sequence, an 

efficient ribosome binding site is desirable. In E. 
coli, the ribosome binding site is called the Shine- 
Delgarno (SD) sequence and includes an initiation codon 



and a sequence 3-9 nucleotides in length located 3-11 
nucleotides upstream of the initiation codon. 

The expression vector may also include a signal peptide 
sequence that provides for secretion of the HAP protein 
in bacteria- The signal sequence typically encodes a 
signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell, 
as is well known in the art. The protein is either 
secreted into the growth media (gram-positive bacteria) 
or into the periplasmic space, located between the inner 
and outer membrane of the cell (gram-negative bacteria) . 

The bacterial expression vector may also include a 
selectable marker gene to allow for the selection of 
bacterial strains that have been transformed. Suitable 
selection genes include genes which render the bacteria 
resistant to drugs such as ampicillin, chloramphenicol, 
erythroinycin, kanamycin, neomycin and tetracycline. 
Selectable markers also include biosynthetic genes, such 
as those in the histidine, tryptophan and leucine 
biosynthetic pathways. 

These components are assembled into expression vectors. 
Expression vectors for bacteria are well known. in the 
art, and include vectors for Bacillus subtilis, E. coll, 
Streptococcus cremoris, and Streptococcus lividans, 
among others. 

The bacterial expression vectors are transformed into 
bacterial host cells using techniques well known in the 
art, such as calcium chloride treatment, 
electroporation, and others. 



WO 96/05858 



PCT/US95/10661 



•22- 



In one embodiment, HAP proteins are produced in insect 
cells. Expression vectors for the transformation of 
insect cells, and in particular, baculovirus -based 
expression vectors, are well known in the art. Briefly, 
baculovirus is a very large DNA virus which produces its 
coat protein at very high levels. Due to the size of 
the baculoviral genome, exogenous genes must be placed 
in the viral genome by recombination. Accordingly, the 
components of the expression system include: a transfer 
vector, usually a bacterial plasmid, which contains both 
a fragment of the baculovirus genome, and a convenient 
restriction site for insertion of the HAP protein; a 
wild type baculovirus with a sequence homologous to the 
baculovirus-specific fragment in the transfer vector 
(this allows for the homologous recombination of the 
heterologous gene into the baculovirus genome); and 
appropriate insect host cells and growth media. 

Mammalian expression systems are also known in the art 
and are used in one embodiment. A mammalian promoter 
is any DNA sequence capable of binding mammalian RNA 
polymerase and initiating the downstream <3') 
transcription of a coding sequence for HAP protein into 
mRNA. A promoter will have a transcription initiating 
region, which is usually place proximal to the 5' end 
of the coding sequence, and a TATA box, using a located 
25-30 base pairs upstream of the transcription 
initiation site. The TATA box is thought to direct RNA 
polymerase II to begin RNA synthesis at the correct 
site. A mammalian promoter will also contain an 
upstream promoter element, typically located within 100 
to 200 base pairs upstream of the TATA box. An upstream 
promoter element determines the rate at which 
transcription is initiated and can act in either 
orientation. Of particular use as mammalian promoters 
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are the promoters from mammalian viral genes, since the 
viral genes are often highly expressed and have a broad 
host range. Examples include the SV4 0 early promoter, 
mouse mammary tumor virus LTR promoter, adenovirus major 
5 late promoter, and herpes simplex virus promoter. 

Typically, transcription termination and polyadenylation 
sequences recognized by mammalian cells are regulatory 
regions located 3' to the translation stop codon and 
thus, together with the promoter elements, flank the 
10 coding sequence. The 3' terminus of the mature mRNA is 

formed by site-specific post- translational cleavage and 
polyadenylation. Examples of transcription terminator 
and polyadenlytion signals include those derived form 
SV40. 

The methods of introducing exogenous nucleic acid into 
mammalian hosts, as well as other hosts, is well known 
in the art, and will vary with the host cell used. 
Techniques include dextran-mediated transf ection, 
calcium phosphate precipitation, polybrene mediated 
transf ection, protoplast fusion, elect roporat ion, 
encapsulation of the polynucleotide (s) in liposomes, and 
direct microinjection of the DNA into nuclei. 

In a preferred embodiment, HAP protein is produced in 
yeast cells. Yeast expression systems are well known 
25 in the art, and include expression vectors for 

Saccharomvces cerevisiae . Candida albicans and C. 
maltosa . Hansenula polvmorpha . Kl uyve r om vce s fraailis 
and K. lactis , Pichia cruillerimondii and P. pastoris . 
Schizosaccharomvces pombe , and Yarrowia 1 ipol vt ica . 
30 Preferred promoter sequences for expression in yeast 

include the inducible GAL1,10 promoter, the promoters 
from alcohol dehydrogenase, enolase, glucokinase, 
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glucose - 6 -phosphate isomerase , glyceraldehyde - 3 - 
phosphate -dehydrogenase, hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, pyruvate 
kinase, and the acid phosphatase gene. Yeast selectable 
markers include ADE2, HIS4, LEU2, TRP1, and ALG7, which 
confers resistance to tunicamycin; the G418 resistance 
gene, which confers resistance to G418; and the CUP1 
gene, which allows yeast to grow in the presence of 
copper ions. 

A recombinant HAP protein may be expressed 
intracellular^ or secreted. The HAP protein may also 
be made as a fusion protein, using techniques well known 
in the art. Thus, for example, if the desired epitope 
is small, the HAP protein may be fused to a carrier 
protein to form an immunogen. Alternatively, the HAP 
protein may be made as a fusion protein to increase 
expression. 

Also included within the definition of HAP proteins of 
the present invention are amino acid sequence variants . 
These variants fall into one or more of three classes: 
substitutional, insertional or deletional variants. 
These variants ordinarily are prepared by site specific 
mutagenesis of nucleotides in the DNA encoding the HAP 
protein, using cassette mutagenesis or other techniques 
well known in the art, to produce DNA encoding the 
variant, and thereafter expressing the DNA in 
recombinant cell culture as outlined above. However, 
variant HAP protein fragments having up to about 100-150 
residues may be prepared by in vitro synthesis using 
established techniques. Amino acid sequence variants 
are characterized by the predetermined nature of the 
variation, a feature that sets them apart from naturally 
occurring allelic or interspecies variation of the HAP 
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protein amino acid sequence. The variants typically 
exhibit the same qualitative biological activity as the 
naturally occurring analogue, although variants can also 
be selected which have modified characteristics as will 
be more fully outlined below. 

While the site or region for introducing an amino acid 
sequence variation is predetermined, the mutation per 
se need not be predetermined. For example, in order to 
optimize the performance of a mutation at a given site, 
random mutagenesis may be conducted at the target codon 
or region and the expressed HAP protein variants 
screened for the optimal combination of desired 
activity. Techniques for making substitution mutations 
at predetermined sites in DNA having a known sequence 
are well known, for example, M13 primer mutagenesis. 
Screening of the mutants is done using assays of HAP 
protein activities; for example, mutated HAP genes are 
placed in HAP deletion strains and tested for HAP 
activity, as disclosed herein. The creation of deletion 
strains, given a gene sequence, is known in the art. 
For example, nucleic acid encoding the variants may be 
expressed in a Haemophilus influenzae strain deficient 
in the HAP protein, and the adhesion and infect ivity of 
the variant Haemophilus influenzae evaluated. 
Alternatively, the variant HAP protein may be expressed 
and its biological characteristics evaluated, for 
example its proteolytic activity. 

Amino acid substitutions are typically of single 
residues; insertions usually will be on the order of 
from about 1 to 20 amino acids, although considerably 
larger insertions may be tolerated. Deletions range 
from about 1 to 30 residues, although in some cases 
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deletions may be much larger, as for example when one 
of the domains of the HAP protein is deleted. 

Substitutions, deletions, insertions or any combination 
thereof may be used to arrive at a final derivative. 
Generally these changes are done on a few amino acids 
to minimize the alteration of the molecule. However, 
larger changes may be tolerated in certain 
circumstances . 

When small alterations in the characteristics of the HAP 
protein are desired, substitutions are generally made 
in accordance with the following chart: 

Chart I 



Original Residue Exemplary Subs£j^utjons 



Ser 
Lys 

Gin, His 

Glu 

Ser 

Asn 

Asp 

Pro 

Asn, Gin 
Leu, Val 
lie, Val 
Arg, Gin, Glu 
( Leu, lie 
Met, Leu, Tyr 
Thr 
Ser 
Tyr 

Trp, Phe 
lie, Leu 



Ala 

15 Arg 
Asn 
Asp 
Cys 
Gin 

20 Glu 
Gly 
His 
He 
Leu 

25 Lys 
Met 
Phe 
Ser 
Thr 

30 Trp 
Tyr 
Val 



35 



Substantial changes in function or immunological 
identity are made by selecting substitutions that are 
less conservative than those shown in Chart I. For 



example, substitutions may be made which more 
significantly affect : the structure of the polypeptide 
backbone in the area of the alteration , for example the 
alpha-helical or beta- sheet structure; the charge or 
hydrophobicity of the molecule at the target site; or 
the bulk of the side chain. The substitutions which in 
general are expected to produce the greatest changes in 
the polypeptide's properties are those in which (a) a 
hydrophilic residue, e.g. seryl or threonyl, is 
substituted for (or by) a hydrophobic residue, e.g. 
leucyl, isoleucyl, phenylalanyl , valyl or alanyl; (b) 
a cysteine or proline is substituted for (or by) any 
other residue; (c) a residue having an electropositive 
side chain, e.g. lysyl, arginyl, or histidyl, is 
substituted for (or by) an electronegative residue, e.g. 
glutamyl or aspartyl; or (d) a residue having a bulky 
side chain, e.g. phenylalanine, is substituted for (or 
by) one not having a side chain, e.g. glycine. 

The variants typically exhibit the same qualitative 
biological activity and will elicit the same immune 
response as the naturally-occurring analogue, although 
variants also are selected to modify the characteristics 
of the polypeptide as needed. Alternatively, the 
variant may be designed such that the biological 
activity of the HAP protein is altered. For example, 
the proteolytic activity of the larger 110 kD domain of 
the HAP protein may be altered, through the substitution 
of the amino acids of the active site. The putative 
catalytic domain of this protein is GDSGSPMF, with the 
first serine corresponding to the active site serine 
characteristic of serine type proteases. The residues 
of the active site may be individually or simultaneously 
altered to decrease or eliminate proteolytic activity. 
This may be done to decrease the toxicity or side 
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effects of the vaccine. Similarly, the cleavage site 
between the 45 kD domain and the 100 kD domain may be 
altered, for example to eliminate proteolytic processing 
to form the two domains . Putatively this site is at 
residue 960. 

In a preferred embodiment, the HAP protein is purified 
or isolated after expression. HAP proteins may be 
isolated or purified in a variety of ways known to those 
skilled in the art depending on what other components 
are present in the sample. Standard purification 
methods include electrophoret ic , molecular, 
immunological and chromatographic techniques, including 
ion exchange, hydrophobic, affinity, and reverse-phase 
HPLC chromatography, and chromatof ocusing . For example, 
the HAP protein may be purified using a standard anti- 
HAP antibody column. Ultrafiltration and diaf iltration 
techniques, in conjunction with protein concentration, 
are also useful. For general guidance in suitable 
purification techniques, see Scopes, R., Protein 
Purification, Springer- Verlag, NY (1982) . The degree 
of purification necessary will vary depending on the use 
of the HAP protein. In some instances no purification 
will be necessary. 

Once expressed and purified if necessary, the HAP 
proteins are useful in a number of applications. 

For example, the HAP proteins can be coupled, using 
standard technology, to affinity chromatography columns. 
These columns may then be used to purify antibodies from 
samples obtained from animals or patients exposed to the 
Haemophilus influenzae organism. The purified 

antibodies may then be used as outlined below. 
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Additionally, the HAP proteins are useful to make 
antibodies to HAP proteins. These antibodies find use 
in a number of applications. In a preferred embodiment, 
the antibodies are used to diagnose the presence of an 
5 Haemophilus influenzae infection in a sample or patient. 

This will be done using techniques well known in the 
art; for example, samples such as blood or tissue 
samples may be obtained from a patient and tested for 
reactivity with the antibodies, for example using 

10 standard techniques such as ELISA. In a preferred 

embodiment, monoclonal antibodies are generated to the 
HAP protein, using techniques well known in the art. 
As outlined above, the antibodies may be generated to 
the full length HAP protein, or a portion of the HAP 

15 protein. 

Antibodies generated to HAP proteins may also be used 
in passive immunization treatments, as is known in the 
art . 

Antibodies generated to unique sequences of HAP proteins 
20 may also be used to screen expression libraries from 

other organisms to find, and subsequently clone, HAP 
nucleic acids from other organisms. 

In one embodiment, the antibodies may be directly or 
indirectly labelled. By "labelled" herein is meant a 

25 compound that has at least one element, isotope or 

chemical compound attached to enable the detection of 
the compound. In general, labels fall into three 
classes: a) isotopic labels, which may be radioactive 
or heavy isotopes; b) immune labels, which may be 

30 antibodies or antigens; and c) colored or fluorescent 

dyes. The labels may be incorporated into the compound 
at any position. Thus, for example, the HAP protein 
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antibody may be labelled for detection, or a secondary 
antibody to the HAP protein antibody may be created and 
labelled. 

In one embodiment, the antibodies generated to the HAP 
proteins of the present invention are used to purify or 
separate HAP proteins or the Haemophilus influenzae 
organism from a sample. Thus for example, antibodies 
generated to HAP proteins which will bind to the 
Haemophilus influenzae organism may be coupled, using 
standard technology, to affinity chromatography columns. 
These columns can be used to pull out the Haemophilus 
organism from environmental or tissue samples. 
Alternatively, antibodies generated to the soluble 110 
kD portion of the full-length portion of the protein 
shown in Figure 7 may be used to purify the 110 kD 
protein from samples . 

In a preferred embodiment, the HAP proteins of the 
present invention are used as vaccines for the 
prophylactic or therapeutic treatment of a Haemophilus 
influenzae infection in a patient. By "vaccine" herein 
is meant an antigen or compound which elicits an immune 
response in an animal or patient. The vaccine may be 
administered prophylactically , for example to a patient 
never previously exposed to the antigen, such that 
subsequent infection by the Haemophilus influenzae 
organism is prevented. Alternatively, the vaccine may 
be administered therapeutically to a patient previously 
exposed or infected by the Haemophilus influenzae 
organism. While infection cannot be prevented, in this 
case an immune response is generated which allows the 
patient's immune system to more effectively combat the 
infection. Thus, for example, there may be a decrease 
or lessening of the symptoms associated with infection. 
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A "patient" for the purposes of the present invention 
includes both humans and other animals and organisms . 
Thus the methods are applicable to both human therapy 
and veterinary applications. 

5 The administration of the HAP protein as a vaccine is 

done in a variety of ways. Generally, the HAP proteins 
can be formulated according to known methods to prepare 
pharmaceutically useful compositions, whereby 
therapeutically effective amounts of the HAP protein are 

10 combined in admixture with a pharmaceutically acceptable 

carrier vehicle. Suitable vehicles and their 

formulation are well known in the art. Such 
compositions will contain an effective amount of the HAP 
protein together with a suitable amount of vehicle in 

15 order to prepare pharmaceutically acceptable 

compositions for effective administration to the host* 
The composition may include salts, buffers, carrier 
proteins such as serum albumin, targeting molecules to 
localize the HAP protein at the appropriate site or 

20 tissue within the organism, and other molecules. The 

composition may include adjuvants as well. 

In one embodiment, the vaccine is administered as a 
single dose; that is, one dose is adequate to induce a 
sufficient immune response to prophylactically or 
25 therapeutically treat a Haemophilus influenzae 

infection. In alternate embodiments, the vaccine is 
administered as several doses over a period of time, as 
a primary vaccination and "booster" vaccinations. 

By "therapeutically effective amounts" herein is meant 
30 an amount of the HAP protein which is sufficient to 

induce an immune response. This amount may be different 
depending on whether prophylactic or therapeutic 
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treatment is desired. Generally, this ranges from about 
0.001 mg to about 1 gm, with a preferred range of about 

0.05 to about , and the preferred dose being . 

These amounts may be adjusted if adjuvants are used. 

The following examples serve to more fully describe the 
manner of using the above -de scribed invention, as well 
as to set forth the best modes contemplated for carrying 
out various aspects of the invention, it is understood 
that these examples in no way serve to limit the true 
scope of this invention, but rather are presented for 
illustrative purposes. 

EXAMPLES 

Example 1 
Cloning of the HAP protein 

Bacterial Strains, plasmids, and phage. H. Influenzae 
strain N187 is a clinical isolate that was originally 
cultivated from the middle ear fluid of a child with 
acute otitis media. This strain was classified as 
nontypable based on the absence of agglutination with 
typing antisera for H. influenzae types a-f (Burroughs 
Wellcome) and the failure to hybridize with pU038, a 
plasmid that contains the entire cap b locus {Kroll and 
Moxon, 1988, J. Bacterid. 170:859-864). 

H. influenzae strain DB117 is a reel mutant of Rd, a 
capsule-deficient serotype d strain that has been in the 
laboratory for over 40 years (Alexander and Leidy, 1951, 
J. Exp. Med. 83:345-359); DB117 was obtained from G. 
Barcak (University of Maryland, Baltimore, MD) (Sellow 
et al., 1968). DB117 is deficient for in vitro 
adherence and invasion, as assayed below. 



WO 96/05858 



PCT7US95/10661 



-33- 

H. influenzae strain 12 is the nontypable strain from 
which the genes encoding the HMW1 and HMW2 proteins were 
cloned (Barenkamp and Leininger, 1992, Infect. Immun. 
60:1302-1313) ; HMW1 and HMW2 are the prototypic members 
5 of a family of nontypable Haemophilus antigenically- 

related high-molecular-weight adhesive proteins (St. 
Geme et al - , 1993). 

E. coli HB101, which is nonadherent and noninvasive, has 
been previously described (Sambrook et al . , 1989, 

10 Molecular cloning: a laboratory manual, 2nd ed. Cold 

Spring Harbor Laboratory, Cold Spring Harbor, N.Y. ) . 
E. coli DH5or was obtained from Bethesda Research 
Laboratories. E. coli MC1061 was obtained from H. 
Kimsey (Tufts University, Boston, MA) . E. coli XL-1 

15 Blue and the plasmid pBluescript KS- were obtained from 

Stratagene. Plasmid pT7-7 and phage mGPl-2 were 
provided by S. Tabor (Harvard Medical School, Boston, 
MA) (Tabor and Richardson, 1985, Proc. Natl. Acad. Sci. 
USA. 82:1074-1078). The E* coli -Haemophilus shuttle 

20 vector pGJB103 (Tomb et al., 1989, Rd. J. Bacteriol . 

171:3796-3802) and phage X1105 (Way et al,, 1984, Gene. 
32:3 69-379) were provided by G. Barcak (University of 
Maryland, Baltimore, MD) . Plasmid pVD116 harbors the 
IgAl protease gene from H. influenzae strain Rd (Koomey 

25 and Falkow, 1984, Infect. Immun. 43:101-107) and was 

obtained from M. Koomey (University of Michigan, Ann 
Arbor, MI) . 

Growth conditions. H. influenzae strains were grown as 
described (Anderson et al . , 1972, J. Clin. Invest. 
30 51:31-38). They were stored at -80°C in brain heart 

infusion broth with 25% glycerol. E. coli strains were 
grown on LB agar or in LB broth. They were stored at - 
80°C in LB broth with 50% glycerol. 
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For H. influenzae, tetracycline was used in a 
concentration of 5 fig/ml and kanamycin was used in a 
concentration of 25 /ig/ml . For E. coli, antibiotics 
were used in the following concentrations: 
5 tetracycline, 12.5 /xg/ml ; kanamycin, 50 jig/ml; 

ampicillin, 100 tig/ml. 

Recombinant DNA methods. DNA ligations, restriction 
endonuclease digestions, and gel electrophoresis were 
performed according to standard techniques (Sambrook et 
al., 1989, supra). Plasmids were introduced into £. 
coli strains by either chemical transformation or 
electroporation, as described (Sambrook et al, 1989, 
supra; Dower et ah, 1988, Nucleic Acids Res. 16:617- 
6145) . In H. Influenzae transformation was performed 
using the MIV method of Herriott et al. (1970, J. 
Bacteriol. 101:517-524) , and electroporation was carried 
out using the protocol developed for E. coli (Dower et 
al., 1988, supra). 

Construction of genomic library from H. Influenzae 
20 strain N187. High-molecular-weight chromosomal DNA was 

prepared from 3 ml of an overnight broth culture of H. 
influenzae N187 as previously described (Mekalanos, 
1983, Cell. 35:253-263). Following partial digestion 
with 5au3AI, 8 to 12 kb fragments were eluted into DEAE 
25 paper (Schleicher & Schuell, Keene, H.H.) and then 

ligated to Bgl I I -digested calf intestine phosphatase- 
treated pGJB103. The ligation mixture was 

electroporated into H. influenzae DB117, and 
transf ormants 

30 were selected on media containing tetracycline. 

Transposon mutagenesis. 



10 



15 
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Mutagenesis of plasmid DNA was performed using the mini- 
TnlO kan element described by Way et al . (1984, supra) . 
Initially, the appropriate plasmid was introduced into 
E. coll MC1061. The resulting strain was infected with 
5 A1105, which carries the mini-TnIO Jean transposon. 

Transductants were grown overnight in the presence of 
kanamycin and an antibiotic to select for the plasmid, 
and plasmid DNA was isolated using the alkaline lysis 
method. In order to recover plasmids containing a 
10 transposon insertion, plasmid DNA was elect roporated 

into E. coli DH5a, plating on media containing kanamycin 
and the appropriate second antibiotic. 

In order to establish more precisely the region of pN187 
involved in promoting interaction with host cells, 

15 initially this plasmid was subjected to restriction 

endonuclease analysis. Subsequently, several subclones 
were constructed in . the vector pGJB103 and were 
reintroduced into H. influenzae strain DB117. The 
resulting strains were then examined for adherence and 

20 invasion. As summarized in Figure 4, subclones 

containing either a 3.9-kb Pstl-Bglll fragment (pJS105) 
or the adjoining 4.2-kb Bgrlll fragment (pJS102) failed 
to confer the capacity to associate with Chang cells. 
In contrast, a subclone containing an insert that 

25 included portions of both of these fragments (pJS106) 

did promote interaction with epithelial monolayers. 
Transposon mutagenesis performed on pH187 confirmed that 
the flanking portions of the insert in this plasmid were 
not required for the adherent /invasive phenotype. On 

3 0 the other hand, a transposon insertion located adjacent 

to the Bglll site in pJS106 eliminated adherence and 
invasion. An insertion between the second FcoRI and 
PstI sites in this plasmid had a similar effect (Figure 
4) . 
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Examination of plasmld- encoded proteins. 

In order to examine plasmid encoded proteins, relevant 
DNA was ligated into the bacteriophage T7 expression 
vector pT7-7, and the resulting construct was 
5 transformed into E. coli XL-1 Blue. Plasmid pT7-7 

contains the T7 phage 010 promoter and ribosomal binding 
site upstream of a multiple cloning site (Tabor and 
Richardson , 1985, supra) . The T7 promoter was induced 
by infection with the recombinant M13 phage mGPl-2 and 
1C addition of isopropyl-0-D- thiogalactopyranoside (final 

concentration, 1 mM) . Phage mGPl-2 contains the gene 
encoding T7 RNA polymerase, which activates the 010 
promoter in pT7-7 (Tabor and Richardson, 1985, supra) . 

Like DB117(pN187) , strain DB117 carrying pJS106 

15 expressed new outer membrane proteins 160-kD and 45-kD 

in size (Figure 3, lane 3) . In order to examine whether 
the 6.5-kb insert in pJS106 actually encodes these 
proteins, this fragment of DNA was ligated into the 
bacteriophage T7 expression vector pT7-7. The resulting 

20 plasmid containing the insert in the same orientation 

as in pNl87 was designated pJS104, and the plasmid with 
the insert in the opposite orientation was designated 
pJS103. Both pJS104, and p7S103 were introduced into 
E. coli XL-1 Blue, producing XL-1 Blue(pJS104) and XL-1 

25 Blue(pJS103) , respectively. As a negative control, pT7- 

7 was also transformed into XL-1 Blue. The T7 promoter 
was induced in these three strains by infection with the 
recombinant M13 phage mGPl-2 and addition of isopropyl- 
£-D-thiogalactopyranoside (final concentration, 1 mM) , 

3 0 and induced proteins were detected using [ 35 5] 

methionine. As shown in Figure 5, induction of XL-1 
Blue(pJS104) resulted in expression of a 160-kD protein 
and several smaller proteins which presumably represent 
degradation products. In contrast, when XL-1 
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Blue(pJS103) and XL-1 Blue(pT7-7) were induced, there 
was no expression of these proteins. There was no 45-kD 
protein induced in any of the three strains. This 
experiment suggested that the 6.5-kb insert present in 
pJS106 contains the structural gene for the 160-kD outer 
membrane protein identified in DB117 (p*JS106) . On the 
other hand, this analysis failed to establish the origin 
of the 45-kD membrane protein expressed by 
DB117 (pJS106) . 

Adherence and invasion assays. 

Adherence and invasion assays were performed with Chang 
epithelial cells [Wong-Kilbourne derivative, clone l-5c- 
4 (human conjunctiva)] , which were seeded into wells of 
24 -well tissue culture plates as previously described 
(St. Geme and Falkow, 1990) . Adherence was measured 
after incubating bacteria with epithelial monolayers for 
30 minutes as described (St. Geme et al . , 1993). 
Invasion assays were carried out according to our 
original protocol and involved incubating bacteria with 
epithelial cells for four hours followed by treatment 
with gentamicin for two hours (100 tig/ml) (St. Geme and 
Falkow, 1990) . 

Nucleotide sequence determination and analysis. 

Nucleotide sequence was determined using a Sequenase kit 
and double stranded plasmid template. DNA fragments 
were subcloned into pBluescript KS" and sequenced along 
both strands by primer walking. DNA sequence analysis 
was performed using the Genetics Computer Group (GCG) 
software package from the University of Wisconsin 
(Devereux et al . , 1984). Sequence similarity searches 
were carried out using the BLAST program of the National 
Center for Biotechnology Information (Altschul et al . , 
1990, J. Mol. Biol. 215:403-410). The DNA sequence 
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described here will be deposited in the 
EMBL/GenBank/DDBJ Nucleotide Sequence Data Libraries. 

Based on the our subcloning results, we reasoned that 
the central Bgrlll site in pH187 was positioned within 
an open reading frame. Examination of a series of mini- 
TnlO Jean mutants supported this conclusion (Figure 4) . 
Consequently, we sequenced DHA on either side of this 
Bglll site and identified a 4182 bp gene, which we have 
designated hap for Haemophilus adherence and penetration 
(Figure 6) . This gene encodes a 1394 amino acid 
polypeptide, which we have called Hap, with a calculated 
molecular mass of 155.4-kD. in go P d agreement with the 
molecular mass of the larger of the two novel outer 
membrane proteins expressed by DB117(pN187) and the 
protein expressed after induction of XL-1 Blue/pJS104 . 
The hap gene has a G+C content of 39.1%, similar to the 
published estimate of 38.7% for the whole genome 
(Kilian, 1976, J. Gen. Microbiol. 93:9-62). Putative - 
10 and -35 promoter sequences are present upstream of 
the initiation codon. A consensus ribosomal binding 
site is lacking. A sequence similar to a rho- 
independent transcription terminator is present 
beginning 39 nucleotides beyond the stop codon and 
contains interrupted inverted repeats with the potential 
for forming a hairpin structure containing a loop of 
three bases and a stem of eight bases. Similar to the 
situation with typical E. coli terminators, this 
structure is followed by a stretch rich in T residues. 
Analysis of the predicted amino acid sequence suggested 
the presence of a 25 amino acid signal peptide at the 
amino terminus. This region has characteristics typical 
of procaryotic signal peptides, with three positive H- 
terminal charges, a central hydrophobic region, and 
alanine residues at positions 23 and 25 (-3 and -1 
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relative to the putative cleavage site) (von Heijne, 
1984, J. Mol. Biol. 173:243-251). 

Comparison of the deduced amino acid sequence of Hap 
with other proteins. A protein sequence similarity 
search was performed with the predicted amino acid 
sequence using the BLAST network service of the National 
Center for Biotechnology Information (Altschul et al 
1990, supra) . This search revealed homology with the 
IgAl proteases of H. influenzae and Neisseria 
gonorrhoeae. Alignment of the derived amino acid 
sequences for the hap gene product and the IgAl 
proteases from four different H. influenzae strains 
revealed homology across the extent of the proteins 
(Figure 7) , with several stretches showing 55-60% 
identity and 70-80% similarity. Similar levels of 
homology were noted between the hap product and the IgAl 
protease from N. gonorrhoeae strain MS11. This 
homology includes the region identified as the catalytic 
site of the IgAl proteases, which is comprised of the 
sequence GDSGSPLF, where 2 is the active site serine 
characteristic of serine proteases (Brenner, 1988, 
Nature (London). 334:528-530; Poulsen et al., 1992, J. 
Bacterid. 174:2913-2921). In the case of Hap, the 
corresponding sequence is GDSGSPMF. The hap product 
also contains two cysteines corresponding to the 
cysteines proposed to be important in forming the 
catalytic domain of the IgA proteases (Pohlner et al . , 
1987, supra) . Overall there is 30-35% identity and 51- 
55% similarity between the hap gene product and the H. 
influenzae and N~ gonorrhoeae IgA proteases. 

The deduced amino acid sequence encoded by hap was also 
found to contain significant homology to Tsh, a 
hemagglutinin expressed by an avian E. coli strain 
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(Provence and Curtiss, 1994 , supra). This homology 
extends throughout both proteins but is greatest in the 
H- terminal half of each. Overall the two proteins are 
30.5% identical and 51.6% similar. Tsh is also 
synthesized as a preprotein and is secreted as a smaller 
form; like the IgAl proteases and perhaps Hap, a carboxy 
terminal peptide remains associated with the outer 
membrane (D. Provence, personal communication) . While 
this protein is presumed to have proteolytic activity, 
its substrate has not yet been determined. 
Interestingly, Tsh was first identified on the basis of 
its capacity to promote agglutination of erythrocytes. 
Thus Hap and Tsh are possibly the first members of a 
novel class of adhesive proteins that are processed 
analogously to the IgAl proteases. 

Homology was also noted with pertactin, a 69-kD outer 
membrane protein expressed by B. pertussis (Charles et 
al., 1989, Proc. Natl. Acad. Sci. USA. 86:3554-3558). 
The middle portions of these two molecules are 39% 
identical and nearly 60% similar. This protein contains 
the amino acid triplet arginine-glycine-aspartic acid 
(RGD) and has been shown to promote attachment to 
cultured mammalian cells via this sequence (Leininger 
et al., 1991, Proc. Natl. Acad. Sci. USA. 88:345-349). 
Although Bordetella species are not generally considered 
intracellular parasites, work by Ewanowich and coworkers 
indicates that these respiratory pathogens are capable 
of in vitro entry into human epithelial cells (Ewanowich 
eta!., 1989, Infect. Immun. 57:2698-2704; Ewanowich et 
al., 1989, Infect. Immun. 57:1240-1247). Recently 
Leininger et al . reported that preincubation of 
epithelial monolayers with an RGD- containing peptide 
derived from the pertactin sequence specifically 
inhibited B. pertussis entry (Leininger et al., 1992, 
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Infect. Imraun. 60:2380-2385). In addition, these 
. investigators found that coating of Staphylococcus 
aureus with purified pertactin resulted in more 
efficient 5. aureus entry; the RGD-containing peptide 
5 from pertactin inhibited this pertactin-enhanced entry 

by 75%. Although the hap product lacks an RGD motif, 
it is possible that Hap and pertactin serve similar 
biologic functions for H. Influenzae and Bordetella 
species , respectively . 

10 Additional analysis revealed significant homology (34 

to 52% identity, 42 to 70% similarity) with six regions 
of HpmA, a calcium- independent hemolysin expressed by 
Proteus mire±>ilis (Uphoff and Welch, 1990, supra) . 

The 2iap locus is distinct from the H. influenzae IgAl 
15 protease gene . 

Given the degree of similarity between the hap gene 
product and H. influenzae IgAl protease, we wondered 
whether we had isolated the IgAl protease gene of strain 
N187. To examine this possibility, we performed IgAl 

20 protease activity assays. Among H. influenzae strains, 

two enzymatically distinct types of IgAl protease have 
been found (Mulks et al . , 1982, J. Infect. Dis. 146:266- 
274) . Type 1 enzymes cleave the Pro-Ser peptide bond 
between residues 231 and 232 in the hinge region of 

25 human IgAl heavy chain and generate fragments of roughly 

28 -kD and 31-kD; type 2 enzymes cleave the Pro-Thr bond 
between residues 235 and 236 in the hinge region and 
generate 26.5-kD and 32.5-kD fragments. Previous 
studies of the parent strain from which DB117 was 

30 derived have demonstrated that this strain produces a 

type 1 IgAl protease (Koomey and Falkow, 1984, supra) . 
As shown in Figure 8, comparison of the proteolytic 
activities of strain DB117 and strain N187 suggested 
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that N187 produces a type 2 IgAl protease. We reasoned 
that DB117(pN187) might generate a total of four 
fragments from IgAl protease, consistent with two 
distinct cleavage specificities. Examination of 
DB117(pH187) revealed instead that this transformant 
produces the same two fragments of the IgAl heavy chain 
as does DB117, arguing that this strain produces only 
a type 1 enzyme. 

In an effort to obtain additional evidence against the 
possibility that plasmid pH187 contains the N187 IgAl 
protease gene, we performed a series of Southern blots. 
As shown in Figure 9, when genomic DNA from strain N187 
was digested with EcoRI, Bglll f or BamHI and then probed 
with the hap gene, one set of hybridizing fragments was 
detected. Probing of the same DNA with the iga gene 
from H. influenzae strain Rd resulted in a different set 
of hybridizing bands. Moreover, the iga gene failed to 
hybridize with a purified 4.8-kb fragment that contained 
the intact hap gene. 

The recombinant plasmid associated with adherence and 
invasion encodes a secreted protein. 

The striking homology between the hap gene product and 
the Haemophilus and Neisseria IgAl proteases suggested 
the possibility that these proteins might be processed 
in a similar manner. The IgAl proteases are synthesized 
as preproteins with three functional domains: the N- 
terminal signal peptide, the protease, and a C- terminal 
helper domain, which is postulated to form a pore in the 
outer membrane for secretion of the protease (Poulsen 
et a!., 1989, supra; Pohlner et al., 1987, supra). The 
C-terminal peptide remains associated with the outer 
membrane following an autoproteolytic cleavage event 
that results in release of the mature enzyme. 
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Consistent with the possibility that the hap gene 
product follows a similar fate, we found that 
DB117(pN187) produced a secreted protein approximately 
110-kD in size that was absent from DB117 (pGJB103) 
5 (Figure 10) . This protein was also produced by 

DB117 (pJS106) , but not by DB117 (pJ5102) or 
DB117 (pJS105) . Furthermore, the two mutants with 
transposon insertions within the hap coding region were 
deficient in this protein. In order to determine the 

10 relationship between hap and the secreted protein, this 

protein was transferred to a PVDF membrane and N- 
terminal amino acid sequencing was performed. Excessive 
background on the first cycle precluded identification 
of the first amino acid residue of the free amino 

15 terminus. The sequence of the subsequent seven residues 

was found to be HTYFGID, which corresponds to amino 
acids 27 through 33 of the hap product. 

The introduction of hap into laboratory strains of E. 
coli strains was unable to endow these organisms with 

20 the capacity for adherence or invasion. In considering 

these results, it is noteworthy that the E. coli 
transformants failed to express either the 160-kD or the 
4 5-kD outer membrane protein. Accordingly, they also 
failed to express the 110-kD secreted protein. The 

25 explanation for this lack of expression is unclear. One 

possibility is that the H. influenzae promoter or 
ribosomal binding site was poorly recognized in E. coli. 
Indeed the putative -35 sequence upstream of the hap 
initiation codon is fairly divergent from the a70 

30 consensus sequence, and the ribosomal binding site is 

unrecognizable. Alternatively, an accessory gene may 
be required for proper export of the Hap protein, 
although the striking homology with the IgA proteases. 
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which are normally expressed and secreted in B. coll, 
argues against this hypothesis. 

In considering the possibility that the hap gene product 
promotes adherence and invasion by directly binding to 
a host cell surface structure, it seems curious that the 
mature protein is secreted from the organism. However, 
there are examples of other adherence factors that are 
also secreted. Filamentous hemagglutinin is a 220-kD 
protein expressed by B. pertussis that mediates in vitro 
adherence and facilitates natural colonization (Relman 
et al., 1989, Proc. Natl. Acad. Sci . U.S.A. 86:2637- 
2641; Kimura et al., 1990, Infect. Immun. 58:7-16). 
This protein remains surface-associated to some extent 
but is also released from the cell. The process of 
Filamentous hemagglutinin secretion involves an 
accessory protein designated FhaC, which appears to be 
localized to the outer membrane (Willems et al., 1994, 
Molec. Microbiol. 11:337-347). Similarly, the Ipa 
proteins implicated in Shigella invasion are also 
secreted. Secretion of these proteins requires the 
products of multiple genes within the mxi and spa loci 
(Allaoui et al., 1993, Molec. Microbiol. 7:59-68; 
Andrews et al . , 1991, Infect. Immun. 59:1997-2005; 
VenkatMn.ee al., 1992, J. Bacterid. 174 :1990-2001) ' 

It is conceivable that secretion is simply a consequence 
of the mechanism for export of the hap gene product to 
the surface of the organism. However, it is noteworthy 
that the secreted protein contains a serine-type 
protease catalytic domain and shows homology with the 
P. mirobilis hemolysin. These findings suggest that the 
mature Hap protein may possess proteolytic activity and 
raise the possibility that Hap promotes interaction with 
the host cell at a distance by modifying the host cell 



surface. Alternatively, Hap may modify the bacterial 
surface in order to facilitate interaction with a host 
cell receptor. It is possible that hap encodes a 
molecule with dual functions, serving as both adhesin 
and protease. 

Analysis of outer membrane and secreted proteins. 

Outer membrane proteins were isolated on the basis of 
sarcosyl insolubility according to the method of Carlone 
etal. (1986, J. Clin. Microbiol. 24:330-332). Secreted 
proteins were isolated by centrifuging bacterial 
cultures at 16,000 g for 10 minutes, recovering the 
supernatant, and precipitating with trichloroacetic acid 
in a final concentration of 10%. SDS-polyacryiamide gel 
electrophoresis was performed as previously described 
(Laemmli, 1970, Nature (London). 227:680-685). 

To identify proteins that might be involved in the 
interaction with the host cell surface, outer membrane 
protein profiles for DB117(pN187) and DB117 (pGJB103) 
were compared. As shown in Figure 3, DB117(pN187) 
expressed two new outer membrane proteins: a high- 
molecular-weight protein approximately 160-kD in size 
and a 45-kD protein. E. coli HB101 harboring pN187 
failed to express these proteins, suggesting an 
explanation for the observation that HB101(pN187) is 
incapable of adherence or invasion. 

Previous studies have demonstrated that a family of 
antigenically-related high-molecular-weight proteins 
with similarity to filamentous hemagglutinin of 
Bordetella. pertussis mediate attachment by nontypable 
H. influenzae to cultured epithelial cells (St. Geme et 
al . , 1993). To explore the possibility that the gene 
encoding the strain H187 member of this family was 
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cloned, whole cell lysates of N187, DB117 (pN187) , and 
DB117 (pGJB103) were examined by Western immunoblot . Our 
control strain for this experiment was H. influenzae 
strain 12. Using a polyclonal antiserum directed 
5 against HMW1 and HMW2, the prototypic proteins in this 

family, we identified a 140-kD protein in strain H187 
(not shown). m contrast, this antiserum failed to 
react with either DB117(pN187) or DB117 (pGJB103 ) (not 
shown) , indicating that P N187 has no relationship to HMW 
10 protein expression. 

Determination of amino terminal sequence. Secreted 
proteins were precipitated with trichloroacetic acid, 
separated on a 10% SDS-polyacrylamide gel, and 
electrotransferred to a polyvinyl idene difluoride (PVDP) 
15 membrane (Matsudaira, 1987, J. Biol. Chem. 262:10035- 

10038) . Following staining with Coomassie Brilliant 
Blue R-250, the 110-kD protein was cut from the PVDF 
membrane and submitted to the Protein Chemistry 
Laboratory at Washington University School of Medicine 
for amino terminal sequence determination. Sequence 
analysis was performed by automated Edman degradation 
using an Applied Biosystems Model 4 70A protein 
sequencer. 



20 



25 



30 



Examination of igAl protease activity. In order to 
assess IgAl protease activity, bacteria were inoculated 
into broth and grown aerobically overnight. Samples 
were then centrifuged in a microphage for two minutes, 
and supernatants were collected. A 10 (il volume of 
supernatant was mixed with 16 M l of 0.5 M g/ml human IgAl 
(Calbiochem) , and chloramphenicol was added to a final 
concentration of 2 fig/ml. After overnight incubation 
at 37°C, reaction mixtures were electrophoresed on a 10% 
SDS-polyacrylamide gel, transferred to a nitrocellulose 
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membrane, and probed with goat ant i -human IgAl heavy 
chain conjugated to alkaline phosphatase (Kirkegaard & 
Perry) . The membrane was developed by immersion in 
phosphatase substrate solution (5-bromo-4-chloro-3- 
5 indolylphosphate toluidinium-nitro blue tetrazolium 

substrate system; Kirkegaard & Perry) . 

Immunoblot analysis. Immunoblot analysis of bacterial 
whole cell lysates was carried out as described (St. 
Geme et a2 . , 1991) . 

10 Southern hybridization. Southern blotting was performed 

using high stringency conditions as previously described 
(St. Geme and Faikow, 1991). 

Microscopy. 

i. Light microscopy. Samples of epithelial cells with 
15 associated bacteria were stained with Giemsa stain and 

examined by light microscopy as described (St. Geme and 
Faikow, 1990) . 

ii. Transmission electron microscopy. For transmission 
electron microscopy, bacteria were incubated with 

20 epithelial cell monolayers for four hours and were then 

rinsed four times with PBS, fixed with 2% 
glutaraldehyde/1% osmium tetroxide in 0.1 M sodium 
phosphate buffer pH 6.4 for two hours on ice, and 
stained with 0.25% aqueous uranyl acetate overnight. 

25 Samples were then dehydrated in graded ethanol solutions 

and embedded in polybed. Ultrathin sections (0.4 /xm) 
were examined in a Phillips 201c electron microscope. 

As shown in Figure 2, DB117(pN187) incubated with 
monolayers for four hours demonstrated intimate 
30 interaction with the epithelial cell surface and was 
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occasionally found to be intracellular. In a given thin 
section, invaded cells generally contained one or two 
intracellular organisms. Of note, intracellular bacteria 
were more common in sections prepared with strain N187, 
5 an observation consistent with results using the 

gentamicin assay. In contrast, examination of samples 
prepared with strain DB117 carrying cloning vector alone 
(pGJB103) failed to reveal internalized bacteria {not 
shown) . 

10 Having described the preferred embodiments of the 

present invention it will appear to those of ordinary 
skill in the art that various modifications may be made 
to the disclosed embodiments, and that such 
modifications are intended to be within the scope of the 

15 present invention. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(i) APPLICANT: Washington University, et al . 
(ii) TITLE OF INVENTION: Haemophilus Adherence and Penetration Protein 
(iii) NUMBER OF SEQUENCES: 9 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Flehr, Hohbach, Test, Albritton & Herbert 

(B) STREET: 4 Embarcadero Center, Suite 34 00 

(C) CITY: San Francisco 

(D) STATE: California 

(E) COUNTRY: United States 

(F) ZIP: 94111-4187 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95/ 
(5/ FILING DATE: 
(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/296,791 

(B) FILING DATE: 25 AUG 1994 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Trecartin, Richard F. 

(B) REGISTRATION NUMBER: 31,8 01 

(C) REFERENCE /DOCKET NUMBER: FP-59941/RFT 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 781-1989 

(B) TELEFAX: (415) 398-3249 

(C) TELEX: 910 277299 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4319 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : doubl e 

(D) TOPOLOGY: both 

(ix) FEATURE: 

(A) NAME /KEY : CDS 

(B) LOCATION: 60.. 4241 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
TCAATAGTCG TTTAACTAGT ATTTTTTAAT ACGAAAAATT ACTTAATTAA ATAAACATT 59 



ATG AAA AAA ACT GTA TTT CGT CTT AAT TTT TTA ACC GCT TGC ATT TCA 
Met Lys Lys Thr Val Phe Arg Leu Asn Phe Leu Thr Ala Cys lie Ser 
15 10 15 
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TTA GGG ATA GTA TCG CAA GCG TGG GCT GGT CAC ACT TAT TTT GGG ATT 
Leu G!y He Val Ser Gin Ala Trp Ala Gly His Thr Tyr Vne ^fy 

A^ ^ I AT TAT CGT ^ TTT GCC GAG AAT AAA GGG AAG TTC ACA 

Asp Tyr Gin Tyr Tyr Arg Asp Phe Ala Glu Asn Lys Gly Lys Vh e 
" 40 45 

SI? A?I ^ t* 1 " .7 AAG GTT TAT AAC AAA CAA GGG CAA TTA GTT 

Val Gly Ala Gin Asn lie Lys Val Tyr Asn Lys Gin Gly Gin Leu Val 
3U SS 60 

GGC ACA TCA ATG ACA AAA GCC CCG ATG ATT GAT TTT TCT GTA GTG TCA 
Gly Thr Ser Met Thr Lys Ala Pro Met He Asp Phe Ser vVl Val Str 

70 7 S 80 

CGT AAC GGC GTG GCA GCC TTG GTT GAA AAT CAA TAT ATT rrr aw 
Arg Asn Gly Val Ala Ala Leu Val G^, tJn 3J ™ Val Ser V^l 

85 90 95 

GCA CAT AAC GTA GGA TAT ACA GAT GTT GAT TTT GGT GCA GAr rr-n B ao 
Ala His Asn Val Gly Tyr Thr Asp Val Asp ^7e Gly Ala Glu ISy Tsn 

105 no 

AAC CCC GAT CAA CAT CGT TTT ACT TAT AAG ATT GTA AAA CGA AAT AAC 
Asn Pro Asp Gin Hrs Arg Phe Thr Tyr Lys lie Val ^s ^ 

120 125 

TAC AAA AAA GAT AAT TTA CAT CCT TAT GAG GAC GAT TAC CAT AAT CCA 
Tyr Lys Lys Asp Asn Leu His Pro Tyr Glu Asp Asp ?£ Ss P^o 

135 140 

CGA TTA CAT AAA TTC GTT ACA GAA GCG GCT CCA ATT GAT ATG ACT TCT 
Arg Leu Hrs Lys Phe Val Thr Glu Ala Ala Pro l7e 2£ m™ T^r S« 

150 155 160 

**I m 1 ? AAT °? C AGT ACT TAT TCA GAT AGA ACA AAA TAT CCA GAA CGT 
Asn Met Asn Gly Ser Thr Tyr Ser Asp Arg Thr Lys TyV pA^fu^g 

165 170 =* 



175 



GTT CGT ATC GGC TCT GGA CGG CAG TTT TGG CGA AAT GAT CAA r*r 
Val Arg He Gly Ser Gly Arg Gin Phe Trp Arg A^p Si ™ 

185 190 

GGC GAC CAA GTT GCC GGT GCA TAT CAT TAT CTG ACA GCT GGC AAT ACA 
Gly Asp Gin Val Ala Gly Ala Tyr His Tyr Leu Thr Ala Gly Asn Thr 
195 200 205 

CAC AAT CAG CGT GGA GCA GGT AAT GGA TAT TCG TAT TTG GGA GGC GAT 
H,s Asn Gin Arg Gly Ala Gly Asn Gly Tyr Ser Tyr ITu Gly ^ly A^p 
iJAU 215 220 

GTT CGT AAA GCG GGA GAA TAT GGT CCA TTA CCG ATT GCA GGC TCA AAG 
Val Arg Lys Ala Gly Glu Tyr Gly Pro Leu Pro He Ala Gly Ser Lys 
25 230 235 240 

GGG GAC AGT GGT TCT CCG ATG TTT ATT TAT GAT GCT GAA AAA CAA AAA 
Gly Asp Ser Gly Ser Pro Met Phe He Tyr Asp Ala Glu Lys Gin l£ 
245 250 255 

TGG TTA ATT AAT GGG ATA TTA CGG GAA GGC AAC CCT TTT GAA GGC AAA 
Trp Leu He Asn Gly He Leu Arg Glu Gly Asn Pro Phe Glu Gly Tv S 
260 265 270 



155 



203 



251 



299 



347 



3 S?5 



443 



491 



539 



587 



635 



683 



731 



779 



827 



875 
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GAA AAT GGG TTT CAA TTG GTT CGC AAA TCT TAT TTT GAT GAA ATT TTC 923 
Glu Asn Gly Phe Gin Leu Val Arg Lys Ser Tyr Phe Asp Glu lie Phe 
275 280 285 

GAA AGA GAT TTA CAT ACA TCA CTT TAC ACC CGA GCT GGT AAT GGA GTG 971 
Glu Arg Asp Leu His Thr Ser Leu Tyr Thr Arg Ala Gly Asn Gly Val 
290 295 300 

TAC ACA ATT AGT GGA AAT GAT AAT GGT CAG GGG TCT ATA ACT CAG AAA 1019 
Tyr Thr He Ser Gly Asn Asp Asn Gly Gin Gly Ser He Thr Gin Lys 
305 310 315 320 

TCA GGA ATA CCA TCA GAA ATT AAA ATT ACG TTA GCA AAT ATG AGT TTA 106 7 

Ser Gly He Pro Ser Glu He Lys He Thr Leu Ala Asn Met Ser Leu 
325 330 335 

CCT TTG AAA GAG AAG GAT AAA GTT CAT AAT CCT AGA TAT GAC GGA CCT 1115 
Pro Leu Lys Glu Lys Asp Lys Val His Asn Pro Arg Tyr Asp Gly Pro 
340 345 350 

AAT ATT TAT TCT CCA CGT TTA AAC AAT GGA GAA ACG CTA TAT TTT ATG 116 3 

Asn lie Tyr Ser Pro Arg Leu Asn Asn Gly Glu Thr Leu Tyr Phe Met 
355 360 365 

GAT CAA AAA CAA GGA TCA TTA ATC TTC GCA TCT GAC ATT AAC CAA GGG 1211 
Asp Gin Lys Gin Gly Ser Leu He Phe Ala Ser Asp He Asn Gin Gly 
370 375 380 

GCG GGT GGT CTT TAT TTT GAG GGT AAT TTT ACA GTA TCT CCA AAT TCT 1259 
Ala Gly Gly Leu Tyr Phe Glu Gly Asn Phe Thr Val Ser Pro Asn Ser 
385 390 395 400 

AAC CAA ACT TGG CAA GGA GCT GGC ATA CAT GTA AGT GAA AAT AGC ACC 1307 
Asn Gin Thr Trp Gin Gly Ala Gly He His Val Ser Glu Asn Ser Thr 
405 410 415 

GTT ACT TGG AAA GTA AAT GGC GTG GAA CAT GAT CGA CTT TCT AAA ATT 1355 
Val Thr Trp Lys Val Asn Gly Val Glu His Asp Arg Leu Ser Lys He 
420 425 " 430 

GGT AAA GGA ACA TTG CAC GTT CAA GCC AAA GGG GAA AAT AAA GGT TCG 1403 
Gly Lys Gly Thr Leu His Val Gin Ala Lys Gly Glu Asn Lys Gly Ser 
435 440 445 

ATC AGC GTA GGC GAT GGT AAA GTC ATT TTG GAG CAG CAG GCA GAC GAT 14 51 

He Ser Val Gly Asp Gly Lys Val He Leu Glu Gin Gin Ala Asp Asp 
450 455 460 

CAA GGC AAC AAA CAA GCC TTT AGT GAA ATT GGC TTG GTT AGC GGC AGA 1499 
Gin Gly Asn Lys Gin Ala Phe Ser Glu He Gly Leu Val Ser Gly Arg 
465 470 475 480 

GGG ACT GTT CAA TTA AAC GAT GAT AAA CAA TTT GAT ACC GAT AAA TTT 1547 
Gly Thr Val Gin Leu Asn Asp Asp Lys Gin Phe Asp Thr Asp Lys Phe 
485 490 495 

TAT TTC GGC TTT CGT GGT GGT CGC TTA GAT CTT AAC GGG CAT TCA TTA 1595 
Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu Asn Gly His Ser Leu 
S00 505 510 

ACC TTT AAA CGT ATC CAA AAT ACG GAC GAG GGG GCA ATG ATT GTG AAC 1643 
Thr Phe Lys Arg He Gin Asn Thr Asp Glu Gly Ala Met He Val Asn 
515 520 525 
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CAT AAT ACA ACT CAA GCC GCT AAT GTC ACT ATT ACT GGG AAC GAA AGC 
His Asn Thr Thr Gin Ala Ala Asn Val Thr lie Thr Gly Asn Glu Ser 
530 535 540 

ATT GTT CTA CCT AAT GGA AAT AAT ATT AAT AAA CTT GAT TAC AGA AAA 
lie Val Leu Pro Asn Gly Asn Asn He Asn Lys Leu Asp Tyr Arg^s 
545 550 555 560 

GAA ATT GCC TAC AAC GGT TGG TTT GGC GAA ACA GAT AAA AAT AAA CAC 
Glu He Ala Tyr Asn Gly Trp Phe Gly Glu Thr Asp Lys Asn Lys His 
565 570 575 

AAT GGG CGA TTA AAC CTT ATT TAT AAA CCA ACC ACA GAA GAT CGT ACT 
Asn Gly Arg Leu Asn Leu lie Tyr Lys Pro Thr Thr Glu Asp Arg ^Thr 
560 585 590 

TTG CTA CTT TCA GGT GGT ACA AAT TTA. AAA GGC GAT ATT ACC CAA ACA 
Leu Leu Leu Ser Gly Gly Thr Asn Leu Lys Gly Asp He Thr Gin Thr 
595 600 605 



AAA GGT AAA CTA TTT TTC AGC GGT AGA CCG ACA CCG CAC GCC TAC AAT 
Lys Gly Lys Leu Phe Phe Ser Gly Arg Pro Thr Pro His Ala Tyr Asn 
610 615 goo 



CAT TTA AAT AAA CGT TGG TCA GAA ATG GAA GGT ATA CCA CAA GGC GAA 
His Leu Asn Lys Arg Trp Ser Glu Met Glu Gly lie Pro Gin Gly Glu 
25 630 635 640 

tTI vl? S GG ?* T GAT 700 ATC AAC CGT ACA TTT AAA GCT GAA AAC 

He Val Trp Asp His Asp Trp He Asn Arg Thr Phe Lys Ala Glu Tsn 
645 650 655 

TTC CAA ATT AAA GGC GGA AGT GCG GTG GTT TCT CGC AAT GTT TCT TCA 
Phe Gin He Lys Gly Gly Ser Ala Val Val Ser Arg Asn vVl sVr ^r 
660 665 670 

tT* 2?° °. GA ^ T TGG ACA GTC AGC AAT GCA AAT GCC ACA TTT GGT 

He Glu Gly Asn Trp Thr Val Ser Asn Asn Ala Asn Ala Thr Vhe ^fly 
675 680 685 

GTT GTG CCA AAT CAA CAA AAT ACC ATT TGC ACG CGT TCA GAT TGG ACA 
Val val Pro Asn Gin Gin Asn Thr He Cys Thr Arg Ser Asp Tre Thr 
690 695 700 

r?C 71* ^ TGT CAA *** GTG GAT "A ACC GAT ACA AAA GTT ATT 

Gly Leu Thr Thr Cys Gin Lys Val Asp Leu Thr Asp Thr Lys Val He 

AAT TCT ATA CCA AAA ACA CAA ATC AAT GGC TCT ATT AAT TTA ACT GAT 
Asn ser He Pro Lys Thr Gin He Asn Gly Ser He Asn ™u Thr Asp 
725 730 735 

^ T *u G G ? G MT GTT GGT "A GCA AAA CTT AAT GGC AAT GTC 

Asn Ala Thr Ala Asn Val Lys Gly Leu Ala Lys Leu Asn Gly Asn Val 
740 745 750 

ACT TTA ACA AAT CAC AGC CAA TTT ACA TTA AGC AAC AAT GCC ACC CAA 
Thr Leu Thr Asn His Ser Gin Phe Thr Leu Ser Asn Asn Ala Thr Gin 
755 760 765 

ATA GGC AAT ATT CGA CTT TCC GAC AAT TCA ACT GCA ACG GTG GAT AAT 
He Gly Asn He Arg Leu Ser Asp Asn Ser Thr Ala Thr Val Asp Asn 

/7 ° 775 780 



1691 



1739 



1787 



1835 



1883 



1931 



1979 



2027 



2075 



2123 



2171 



2219 



2267 



2315 



2363 



2411 
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GCA AAC TTG AAC GGT AAT GTG CAT TTA ACG GAT TCA GCT CAA TTT TCT 2459 
Ala Asn Leu Asn Gly Asn Val His Leu Thr Asp Ser Ala Gin Phe Ser 
785 790 795 800 

TTA AAA AAC AGC CAT TTT TCG CAC CAA ATT CAG GGA GAC AAA GGC ACA 2507 
Leu Lys Asn Ser His Phe Ser His Gin lie Gin Gly Asp Lys Gly Thr 
805 810 815 

ACA GTG ACG TTG GAA AAT GCG ACT TGG ACA ATG CCT AGC GAT ACT ACA 2 55 5 

Thr Val Thr Leu Glu Asn Ala Thr Trp Thr Met Pro Ser Asp Thr Thr 
820 825 830 

TTG CAG AAT TTA ACG CTA AAT AAC AGT ACG ATC ACG TTA AAT TCA GCT 2603 
Leu Gin Asn Leu Thr Leu Asn Asn Ser Thr lie Thr Leu Asn Ser Ala 
835 840 845 

TAT TCA GCT AGC TCA AAC AAT ACG CCA CGT CGC CGT TCA TTA GAG ACG 26 51 

Tyr Ser Ala Ser Ser Asn Asn Thr Pro Arg Arg Arg Ser Leu Glu Thr 
850 855 860 

GAA ACA ACG CCA ACA TCG GCA GAA CAT CGT TTC AAC ACA TTG ACA GTA 26 9 9 

Glu Thr Thr Pro Thr Ser Ala Glu His Arg Phe Asn Thr Leu Thr Val 
865 870 875 880 

AAT GGT AAA TTG AGT GGG CAA GGC ACA TTC CAA TTT ACT TCA TCT TT* 2 7*7 

Asn Gly Lys Leu Ser Gly Gin Gly Thr Phe Gin Phe Thr Ser Ser Leu 
885 890 895 

TTT GGC TAT AAA AGC GAT AAA TTA AAA TTA TCC AAT GAC GCT GAG GGC 2795 
Phe Gly Tyr Lys Ser Asp Lys Leu Lys Leu Ser Asn Asp Ala Glu Gly 
900 905 910 

GAT TAC ATA TTA TCT GTT CGC AAC ACA GGC AAA GAA CCC GAA ACC CTT 284 3 

Asp Tyr lie Leu Ser Val Arg Asn Thr Gly Lys Glu Pro Glu Thr Leu 
915 920 925 

GAG CAA TTA ACT TTG GTT GAA AGC AAA GAT AAT CAA CCG TTA TCA GAT 2891 
Glu Gin Leu Thr Leu Val Glu Ser Lys Asp Asn Gin Pro Leu Ser Asp 
930 935 940 

AAG CTC AAA TTT ACT TTA GAA AAT GAC CAC GTT GAT GCA GGT GCA TTA 2 93 9 

Lys Leu Lys Phe Thr Leu Glu Asn Asp His Val Asp Ala Gly Ala Leu 
945 950 955 960 

CGT TAT AAA TTA GTG AAG AAT GAT GGC GAA TTC CGC TTG CAT AAC CCA 2987 
Arg Tyr Lys Leu Val Lys Asn Asp Gly Glu Phe Arg Leu His Asn Pro 
965 970 975 

ATA AAA GAG CAG GAA TTG CAC AAT GAT TTA GTA AGA GCA GAG CAA GCA 3035 
He Lys Glu Gin Glu Leu His Asn Asp Leu Val Arg Ala Glu Gin Ala 
980 985 ^ 990 

GAA CGA ACA TTA GAA GCC AAA CAA GTT GAA CCG ACT GCT AAA ACA CAA 3083 
Glu Arg Thr Leu Glu Ala Lys Gin Val Glu Pro Thr Ala Lys Thr Gin 
99S 1000 1005 

ACA GGT GAG CCA AAA GTG CGG TCA AGA AGA GCA GCG AGA GCA GCG TTT 3131 
Thr Gly Glu Pro Lys Val Arg Ser Arg Arg Ala Ala Arg Ala Ala Phe 
1010 1015 1020 

CCT GAT ACC CTG CCT GAT CAA AGC CTG TTA AAC GCA TTA GAA GCC AAA 3179 
Pro Asp Thr Leu Pro Asp Gin Ser Leu Leu Asn Ala Leu Glu Ala Lys 
1025 1030 1035 ib40 
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CAA GCT GAA CTG ACT GCT GAA ACA CAA AAA AGT AAG GCA AAA ACA AAA 3227 
Gin Ala Glu Leu Thr Ala Glu Thr Gin Lys Ser Lys Ala Lys Thr Lys 
1045 1050 1055 

AAA GTG CGG TCA AAA AGA GCA GTG TTT TCT GAT CCC CTG CTT GAT CAA 3275 
Lys Val Arg Ser Lys Arg Ala Val Phe Ser Asp Pro Leu Leu Asp Gin 
1060 1065 1070 

AGC CTG TTC GCA TTA GAA GCC GCA CTT GAG GTT ATT GAT GCC CCA CAG 3 323 

Ser Leu Phe Ala Leu Glu Ala Ala Leu Glu Val lie Asp Ala Pro Gin 
1075 1080 1085 

CAA TCG GAA AAA GAT CGT CTA GCT CAA GAA GAA GCG GAA AAA CAA CGC 3371 
Gin Ser Glu Lys Asp Arg Leu Ala Gin Glu Glu Ala Glu Lys Gin Aro 
1090 1095 1100 

AAA CAA AAA GAC TTG ATC AGC CGT TAT TCA AAT AGT GCG TTA TCA GAA 3419 

Lys Gin Lys Asp Leu He Ser Arg Tyr Ser Asn Ser Ala Leu Ser Glu 
1105 mo ins I12Q 

TTA TCT GCA ACA GTA AAT AGT ATG CTT TCT GTT CAA GAT GAA TTA GAT 34 6 7 

Leu Ser Ala Thr Val Asn Ser Met Leu Ser Val Gin Asp Glu Leu Asp 
1125 H30 ^ H35 

CGT CTT TTT GTA GAT CAA GCA CAA TCT GCC GTG TGG ACA AAT ATC GCA 3515 
Arg Leu Phe Val Asp Gin Ala Gin Ser Ala Val Trp Thr Asn He Ala 
1140 H45 H50 

CAG GAT AAA AGA CGC TAT GAT TCT GAT GCG TTC CGT GCT TAT CAG CAG 3563 
Gin Asp Lys Arg Arg Tyr Asp Ser Asp Ala Phe Arg Ala Tyr Gin Gin 
1155 H60 H65 

CAG AAA ACG AAC TTA CGT CAA ATT GGG GTG CAA AAA GCC TTA GCT AAT 3611 
Gin Lys Thr Asn Leu Arg Gin He Gly Val Gin Lys Ala Leu Ala Asn 
H70 H75 H80 

GGA CGA ATT GGG GCA GTT TTC TCG CAT AGC CGT TCA GAT AAT ACC TTT 3659 
Gly Arg He Gly Ala Val Phe Ser His Ser Arg Ser Asp Asn Thr Phe 
1185 H90 H95 1200 

GAT GAA CAG GTT AAA AAT CAC GCG ACA TTA ACG ATG ATG TCG GGT TTT 3707 
Asp Glu Gin Val Lys Asn His Ala Thr Leu Thr Met Met Ser Gly Phe 
1205 1210 1215 

GCC CAA TAT CAA TGG GGC GAT TTA CAA TTT GGT GTA AAC GTG GGA ACG 3 755 

Ala Gin Tyr Gin Trp Gly Asp Leu Gin Phe Gly Val Asn Val Gly Thr 
1220 1225 1230 

GGA ATC AGT GCG AGT AAA ATG GCT GAA GAA CAA AGC CGA AAA ATT CAT 3803 
Gly He Ser Ala Ser Lys Met Ala Glu Glu Gin Ser Arg Lys He His 
1235 1240 1245 

CGA AAA GCG ATA AAT TAT GGC GTG AAT GCA AGT TAT CAG TTC CGT TTA 3851 
Arg Lys Ala He Asn Tyr Gly Val Asn Ala Ser Tyr Gin Phe Arq Leu 
1250 1255 1260 

GGG CAA TTG GGC ATT CAG CCT TAT TTT GGA GTT AAT CGC TAT TTT ATT 3899 
Gly Gin Leu Gly He Gin Pro Tyr Phe Gly Val Asn Arg Tyr Phe He 
1265 1270 1275 1280 

GAA CGT GAA AAT TAT CAA TCT GAG GAA GTG AGA GTG AAA ACG CCT AGC 3947 
Glu Arg Glu Asn Tyr Gin Ser Glu Glu Val Arg Val Lys Thr Pro Ser 
1285 1290 ~ 1295 
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CTT GCA TTT AAT OGC TAT AAT GCT GGC ATT CGA GTT GAT TAT ACA TTT 3995 
Leu Ala Phe Asn Arg Tyr Asn Ala Gly lie Arg Val Asp Tyr Thr Phe 
1300 1305 1310 

ACT CCG ACA GAT AAT ATC AGC GTT AAG CCT TAT TTC TTC GTC AAT TAT 4 04 3 

Thr Pro Thr Asp Asn lie Ser Val Lys Pro Tyr Phe Phe Val Asn Tyr 
1315 1320 1325 

GTT GAT GTT TCA AAC GCT AAC GTA CAA ACC ACG GTA AAT CTC ACG GTG 4 091 

Val Asp Val Ser Asn Ala Asn Val Gin Thr Thr Val Asn Leu Thr Val 
1330 1335 1340 

TTG CAA CAA CCA TTT GGA CGT TAT TGG CAA AAA GAA GTG GGA TTA AAG 4139 
Leu Gin Gin Pro Phe Gly Arg Tyr Trp Gin Lys Glu Val Gly Leu Lys 
1345 1350 1355 1360 

GCA GAA ATT TTA CAT TTC CAA ATT TCC GCT TTT ATC TCA AAA TCT CAA 4187 
Ala Glu lie Leu His Phe Gin lie Ser Ala Phe lie Ser Lys Ser Gin 
1365 1370 1375 

GGT TCA CAA CTC GGC AAA CAG CAA AAT GTG GGC GTG AAA TTG GGC TAT 422r 
Gly Ser Gin Leu Gly Lys Gin Gin Asn Val Gly Val Lys Leu Gly Tyr 
1380 1385 1390 



CGT TGG TAAAAATCAA CATAATTTTA TCG TTTATTG AT AAA C A A Q G TGGG T CAG AT 42 91 

Arg Trp 

CAGATCCCAC CTTTTTTATT CCAATAAT 4319 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1394 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Lys Lys Thr Val Phe Arg Leu Asn Phe Leu Thr Ala Cys lie Ser 
1 5 10 15 

Leu Gly He Val Ser Gin Ala Trp Ala Gly His Thr Tyr Phe Gly He 
20 25 30 

Asp Tyr Gin Tyr Tyr Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Thr 
35 40 45 

Val Gly Ala Gin Asn He Lys Val Tyr Asn Lys Gin Gly Gin Leu Val 
50 55 60 

Gly Thr Ser Met Thr Lys Ala Pro Met He Asp Phe Ser Val Val Ser 
65 70 75 80 

Arg Asn Gly Val Ala Ala Leu Val Glu Asn Gin Tyr He Val Ser Val 
85 90 95 

Ala His Asn Val Gly Tyr Thr Asp Val Asp Phe Gly Ala Glu Gly Asn 
100 105 110 

Asn Pro Asp Gin His Arg Phe Thr Tyr Lys He Val Lys Arg Asn Asn 
115 120 125 
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Tyr Lys Lys Asp Asn Leu His Pro Tyr Glu Asp Asp Tyr His Asn Pro 

AJ => 140 

Arg Leu His Lys Phe Val Thr Glu Ala Ala Pro lie Asp Met Thr Ser 

155 160 

Asn Met Asn Gly Ser Thr Tyr Ser Asp Arg Thr Lys Tyr Pro Glu Arg 

170 175 3 

Val Arg lie Gly Ser Gly Arg Gin Phe Trp Arg Asn Asp Gin Asp Lys 

185 190 

Gly Asp Gin Val Ala Gly Ala Tyr His Tyr Leu Thr Ala Gly Asn Thr 

2 00 205 
His Asn Gin Arg Gly Ala Gly Asn Gly Tyr Ser Tyr Leu Gly Gly Asp 

Val Arg Lys Ala Gly Glu Tyr Gly Pro Leu Pre lie Ala Gly Ser Lys 

2 - - 24C 
Gly Asp Ser Gly Ser Pro Met Phe He Tyr Asp Ala Glu Lys Gin Lys 



250 255 



Trp Leu He Asn Gly lie Leu Arg Glu Gly Asn Pro Phe Glu Gly Lys 
Glu Asn Gly Phe Gin Leu Val Arg Lys Ser Tyr Phe Asp Glu He Phe 

280 28S 

Glu Arg Asp Leu His Thr Ser Leu Tyr Thr Arg Ala Gly Asn Gly Val 

295 300 
Tyr Thr lie Ser Gly Asn Asp Asn Gly Gin Gly Ser He Thr Gin Lys 

315 320 
Ser Gly He Pro Ser Glu He Lys He Thr Leu Ala Asn Met Ser Leu 

325 330 33S 

Pro Leu Lys Glu Lys Asp Lys Val His Asn Pro Arg Tyr Asp Gly Pro 

Asn He Tyr Ser Pro Arg Leu Asn Asn Gly Glu Thr Leu Tyr Phe Met 

J50 365 
Asp Gin Lys Gin Gly Ser Leu He Phe Ala Ser Asp He Asn Gin Gly 

380 

Ala Gly Gly Leu Tyr Phe Glu Gly Asn Phe Thr Val Ser Pro Asn Ser 

395 400 

Asn Gin Thr Trp Gin Gly Ala Gly He His Val Ser Glu Asn Ser Thr 

410 415 

Val Thr Trp Lys Val Asn Gly Val Glu His Asp Arg Leu Ser Lys He 

425 430 
Gly Lys Gly Thr Leu His Val Gin Ala Lys Gly Glu Asn Lys Gly Ser 

He Ser Val Gly Asp Gly Lys Val He Leu Glu Gin Gin Ala Asp Asp 

455 460 
Gin Gly Asn Lys Gin Ala Phe Ser Glu He Gly Leu Val Ser Gly Arg 

475 480 
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Gly Thr Val Gin Leu Asn Asp Asp Lys Gin Phe Asp Thr Asp Lys Phe 
485 490 495 

Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu Asn Gly His Ser Leu 
500 505 510 

Thr Phe Lys Arg lie Gin Asn Thr Asp Glu Gly Ala Met lie Val Asn 
515 520 525 

His Asn Thr Thr Gin Ala Ala Asn Val Thr lie Thr Gly Asn Glu Ser 
530 535 540 

He Val Leu Pro Asn Gly Asn Asn He Asn Lys Leu Asp Tyr Arg Lys 
545 550 555 560 

Glu He Ala Tyr Asn Gly Trp Phe Gly Glu Thr Asp Lys Asn Lys His 
565 570 575 

Asn Gly Arg Leu Asn Leu He Tyr Lys Pro Thr Thr Glu Asp Arg Thr 
580 585 590 

Leu Leu Leu Ser Gly Gly Thr Asn Leu Lys Gly Asp lie Thr Gin Thr 
595 600 605 

Lys Gly Lys Leu Phe Phe Ser Glv Arg Pro Thr Pro His Ala Tyr Asn 
610 615 620 

His Leu Asn Lys Arg Trp Ser Glu Met Glu Gly He Pro Gin Gly Glu 
625 630 635 640 

He Val Trp Asp His Asp Trp He Asn Arg Thr Phe Lys Ala Glu Asn 
645 650 655 

Phe Gin He Lys Gly Gly Ser Ala Val Val Ser Arg Asn Val Ser Ser 
660 665 ~ 670 

He Glu Gly Asn Trp Thr Val Ser Asn Asn Ala Asn Ala Thr Phe Gly 
675 680 685 

Val Val Pro Asn Gin Gin Asn Thr He Cys Thr Arg Ser Asp Trp Thr 
690 695 * 700 

Gly Leu Thr Thr Cys Gin Lys Val Asp Leu Thr Asp Thr Lys Val He 
705 710 715 * 720 

Asn Ser He Pro Lys Thr Gin He Asn Gly Ser He Asn Leu Thr Asp 
725 730 735 

Asn Ala Thr Ala Asn Val Lys Gly Leu Ala Lys Leu Asn Gly Asn Val 
740 745 750 

Thr Leu Thr Asn His Ser Gin Phe Thr Leu Ser Asn Asn Ala Thr Gin 
755 760 765 

He Gly Asn He Arg Leu Ser Asp Asn Ser Thr Ala Thr Val Asp Asn 
770 775 780 

Ala Asn Leu Asn Gly Asn Val His Leu Thr Asp Ser Ala Gin Phe Ser 
785 790 795 800 

Leu Lys Asn Ser His Phe Ser His Gin He Gin Gly Asp Lys Gly Thr 
805 810 815 

Thr Val Thr Leu Glu Asn Ala Thr Trp Thr Met Pro Ser Asp Thr Thr 
820 825 830 
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Leu Gln *? n Leu Thr Asn Asn Ser Thr He Thr Leu Asn Ser Ala 

83 5 840 845 

Tyr Ser Ala Ser Ser Asn Asn Thr Pro Arg Arg Arg Ser Leu Glu Thr 
850 855 860 

Glu Thr Thr Pro Thr Ser Ala Glu His Arg Phe Asn Thr Leu Thr Val 

870 «75 880 

Asn Gly Lys Leu Ser Gly Gin Gly Thr Phe Gin Phe Thr Ser Ser Leu 

885 890 895 

Phe Gly Tyr Lys Ser Asp Lys Leu Lys Leu Ser Asn Asp Ala Glu Gly 
900 905 910 

Asp Tyr lie Leu Ser Val Arg Asn Thr Gly Lys Glu Pro Glu Thr Leu 
91S 920 925 

Glu Gin Leu Thr Leu Val Glu Ser Lys Asp Asn Gin Pro Leu Ser Aso 
JU 935 940 

Lys Leu Lys Phe Thr Leu Glu Asn Asp His Val Asp Ala Gly Ala Leu 
5 950 955 9 60 

Arg Tyr Lys Leu Val Lys Asn Asp Gly Glu Phe Arg Leu His Asn Pro 
965 970 975 

He Lys Glu Gin Glu Leu His Asn Asp Leu Val Arg Ala Glu Gin Ala 
980 985 99 0 

Glu Arg Thr Leu Glu Ala Lys Gin Val Glu Pro Thr Ala Lys Thr Gin 
995 1000 1005 

1010 G1U ^ tnf c Ser **» Ala Ala Ar 9 Ala Ala 

AUXU 1015 1020 

ioI 5 Asp Thr Leu Pro ^£ n Gln Ser Leu Leu Asn Ala Glu AIa *>y* 

1030 1035 1040 

Gin Ala Glu Leu Thr Ala Glu Thr Gin Lys Ser Lys Ala Lys Thr Lys 
1045 10S0 1055 r 

Lys Val Arg Ser Lys Arg Ala Val Phe Ser Asp Pro Leu Leu Asp Gin 
1060 1065 10 70 

Ser Leu Phe Ala Leu Glu Ala Ala Leu Glu Val lie Asp Ala Pro Gin 
1075 1080 10 85 

Gln ?oL G1U LyS ** p ArS LeU Ala Gln Glu Glu Glu Lys Gin Arg 
1090 109S iioo * 

Lys Gln Lys Asp Leu lie Ser Arg Tyr Ser Asn Ser Ala Leu Ser Glu 

1110 H15 1120 

Leu Ser Ala Thr Val Asn Ser Met Leu Ser Val Gln Asp Glu Leu Asd 
1125 1130 * 

Arg Leu Phe Val Asp Gln Ala Gln Ser Ala Val Trp Thr Asn He Ala 
11*0 1145 1150 

Gln Asp Ly^Arg Arg Tyr Asp Ser^Asp Ala Phe Arg Ala Tyr Gln Gln 



Gln Ly^Thr Asn Leu Arg G^lle Gly Val Gln Lys Ala Leu Ala Asn 



1165 

; 

1160 
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Gly Arg lie Gly Ala Val Phe Ser His Ser Arg Ser Asp Asn Thr Phe 
1185 1190 1195 1200 

Asp Glu Gin Val Lys Asn His Ala Thr Leu Thr Met Met Ser Gly Phe 
1205 1210 1215 

Ala Gin Tyr Gin Trp Gly Asp Leu Gin Phe Gly Val Asn Val Gly Thr 
1220 1225 1230 

Gly He Ser Ala Ser Lys Met Ala Glu Glu Gin Ser Arg Lys He His 
1235 1240 1245 

Arg Lys Ala He Asn Tyr Gly Val Asn Ala Ser Tyr Gin Phe Arg Leu 
1250 1255 1260 

Gly Gin Leu Gly He Gin Pro Tyr Phe Gly Val Asn Arg Tyr Phe He 
1265 1270 1275 " 1280 

Glu Arg Glu Asn Tyr Gin Ser Glu Glu Val Arg Val Lys Thr Pro Ser 
1285 1290 ~ 125b 

Leu Ala Phe Asn Arg Tyr Asn Ala Gly He Arg Val Asp Tyr Thr Phe 
1300 1305 1310 

Thr Pro Thr Asp Asn He Ser Val Lys Pro Tyr Phe Phe Val Asn Tyr 
1315 1320 1325 

Val Asp Val Ser Asn Ala Asn Val Gin Thr Thr Val Asn Leu Thr Val 
1330 1335 1340 

Leu Gin Gin Pro Phe Gly Arg Tyr Trp Gin Lys Glu Val Gly Leu Lys 
1345 1350 1355 1360 

Ala Glu He Leu His Phe Gin He Ser Ala Phe He Ser Lys Ser Gin 
1365 1370 1375 

Gly Ser Gin Leu Gly Lys Gin Gin Asn Val Gly Val Lys Leu Gly Tyr 
1380 1385 1390 

Arg Trp 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 1541 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Met Leu Asn Lys Lys Phe Lys Leu Asn Phe He Ala Leu Thr Val Ala 
15 10 15 

Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 
20 25 30 

Asp Tyr Gin He Phe Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Ser 
35 40 45 

Val Gly Ala Thr Asn Val Leu Val Lys Asp Lys Asn Asn Lys Asp Leu 
50 55 60 
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Gly Thr Ma Leu Pro Asn Gly He Pro Met He Asp Phe Ser Val Val 

7S 80 
Asp val Asp Lys Arg He Ala Thr Leu lie Asn Pro Gin Tyr Val Val 

Gly Val Lys „i s val ser Asn Gly val Ser Glu Leu His Phe Gly Asn 
LeU ASn til Asn AS " ASn Ala Ala "is Arg Asp val 

120 125 

Ser Ser Glu Glu Asn Arg Tyr Phe Ser Val Glu Lys Asn Glu Tyr Pro 

135 140 
Thr Lys Leu Asn Gly Lys Thr Val Thr Thr Glu Asp Gin Thr Gin Lys 

Arg Arg Glu Asp Tyr Tvr Met Pro Arg Leu A sp. L ys Phe v „ ^ ^ 
Val Ala Pro lie Glu Ala Ser Thr Ala Ser Ser Asp Ala Gly Thr Tyr 
Asn Asp Gin Asn Lys Tyr Pro Ala Phe Val Arg Leu Gly Ser Gly Ser 

200 205 

Gin Phe lie Tyr Lys Lys Gly Asp Asn Tyr Ser Leu He Leu Asn Asn 

"=» 220 
His Glu val Gly Gly Asn Asn Leu Lys Leu Val Gly Asp Ala Tyr Thr 

Tyr Gly xie Ala Gly Thr Pro Tyr Lys Val Asn His Glu Asn Asn Gly 

Leu Xie Gly Phe Gly Asn Ser Lys Glu Glu His Ser Asp Pro ^ Gly 

265 270 
He Leu Ser Gin Asp Pro Leu Thr Asn Tyr Ala Val Leu Gly Asp Ser 

^ 80 285 
Gly Ser Pro Leu Phe Val Tyr Asp Arg Glu Lys Gly Lys Trp Leu Phe 

300 

Leu Gly ser Tyr Asp Phe Trp Ala Gly Tyr Asn Lys Lys Ser Trp Gin 

Glu Trp Asn lie Tyr Lys Ser Gin Phe Thr Lys Asp Val Leu Asn Z> 

Asp Ser Ala Gly Ser Leu He Gly Ser Lys Thr Asp Tyr Ser Trp Ser 

Ser Asn Gly Lys Thr Ser Thr lie Thr Gly Gly Glu Lys Ser Leu Asn 

360 365 
Val Asp Leu Ala Asp Gly Lys Asp Lys Pro Asn His Gly Lys Ser Val 

° 380 
Thr Phe Glu Gly Ser Gly Thr Leu Thr Leu Asn Asn Asn lie Asp Gin 

395 400 
Gly Ala Gly Gly Leu Phe Phe Glu Gly Asp Tyr Glu Val Lys Gly Thr 



415 
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Ser Asp Asn Thr Thr Trp Lys Gly Ala Gly Val Ser Val Ala Glu Gly 
420 425 430 

Lys Thr Val Thr Trp Lys Val His Asn Pro Gin Tyr Asp Arg Leu Ala 
435 440 445 

Lys He Gly Lys Gly Thr Leu He Val Glu Gly Thr Gly Asp Asn Lys 
450 455 460 

Gly Ser Leu Lys Val Gly Asp Gly Thr Val He Leu Lys Gin Gin Thr 
465 470 475 48 0 

Asn Gly Ser Gly Gin His Ala Phe Ala Ser Val Gly He Val Ser Gly 
485 490 495 

Arg Ser Thr Leu Val Leu Asn Asp Asp Lys Gin Val Asp Pro Asn Ser 
500 505 510 

He Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu Asn Glv Asn Ser 
515 52C sib 

Leu Thr Phe Asp His He Arg Asn He Asp Asp Gly Ala Arg Leu Val 
530 535 540 

Asn His Asn Met Thr Asn Ala Ser Asn He Thr He Thr Glv Gltj s*r 
545 550 555 ' 560 

Leu He Thr Asp Pro Asn Thr He Thr Pro Tyr Asn He Asp Ala Pro 
565 570 575 

Asp Glu Asp Asn Pro Tyr Ala Phe Arg Arg He Lys Asp Gly Gly Gin 
580 585 590 

Leu Tyr Leu Asn Leu Glu Asn Tyr Thr Tyr Tyr Ala Leu Arg Lys Glv 
595 600 €05 

Ala Ser Thr Arg Ser Glu Leu Pro Lys Asn Ser Gly Glu Ser Asn Glu 
610 615 620 

Asn Trp Leu Tyr Met Gly Lys Thr Ser Asp Glu Ala Lys Arg Asn Val 
625 630 635 ~ €40 

Met Asn His He Asn Asn Glu Arg Met Asn Gly Phe Asn Gly Tyr Phe 
645 €50 €55 

Gly Glu Glu Glu Gly Lys Asn Asn Gly Asn Leu Asn Val Thr Phe Lys 
660 665 €70 

Gly Lys Ser Glu Gin Asn Arg Phe Leu Leu Thr Gly Gly Thr Asn Leu 
675 €80 €85 

Asn Gly Asp Leu Thr Val Glu Lys Gly Thr Leu Phe Leu Ser Glv Ara 
690 €95 700 . 

Pro Thr Pro His Ala Arg Asp He Ala Gly He Ser Ser Thr Lys Lvs 
705 710 715 ^ 720 

Asp Pro His Phe Ala Glu Asn Asn Glu Val Val Val Glu Asp Asd Trt> 
725 730 735 

He Asn Arg Asn Phe Lys Ala Thr Thr Met Asn Val Thr Gly Asn Ala 
740 745 750 

Ser Leu Tyr Ser Gly Arg Asn Val Ala Asn He Thr Ser Asn He Thr 
755 76 0 765 
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Ala Ser Asn Lys Ala Gln val His He Gly Tyr Lys Thr Gly Asp Thr 

'' 3 780 
Val Cys Val Arg Ser Asp Tyr Thr Gly Tyr Val Thr Cys Thr Thr Asp 

795 800 
LyS L6U Hi Ala Leu Ser Phe Asn Pro Thr Asn Leu Arg 

Gly Asn Val Asn Leu Thr Glu Ser Ala Asn Phe Val Leu Gly Lys Ala 



830 



Asn Leu Phe Gly Thr He Gin Ser Arg Gly Asn Ser Gin Val Arg Leu 

B40 84 5 

Thr Glu Asn Ser His Trp His Leu Thr Gly Asn Ser Asp Val His Gin 

855 860 

in Asp Leu Aia Asn ss His iie His Leu *■» ser Aia Asp Asn ser 

Asn Asn Val Thr Lys Tyr Asn Thr Leu Thr Val Asn Ser Leu Ser Gly 

Asn Gly ser Phe Tyr Tyr Leu Thr Asp Leu Ser Asn Lys Gin Gly Asp 

L ys Val val val Thr Lys Ser Ala Thr Gly Asn Phe Thr Z Gin Val 

920 925 
Ala Asp Lys Thr Gly Glu Pro Asn His Asn Glu Leu Thr Leu Phe Asp 

935 940 * 

Ala Ser Lys Ala Gin Arg Asp His Leu Asn Val Ser Leu Val Gly Asn 

955 960 
Thr val Asp Leu Gly Ala Trp Lys Tyr Lys Leu Arg Asn Val Asn Gly 

970 9?5 

Ar 9 Tyr Asp Leu Tyr Asn Pro Glu Val Glu Lys Arg Asn Gin Thr Val 



985 990 



Asp Thr Thr Asn He Thr Thr Pro Asn Asn He Gin Ala Asp Val Pro 

1000 1005 

Ser val Pro Ser Asn Asn Glu Glu He Ala Arg Val Asp Glu Ala P ro 

J.U15 1020 

Valero Pro Pro Ala ProAla Thr Pro Ser Glu Thr Thr Glu Thr Val 

1035 1040 
Ala Glu Asn Ser Ly^Gln Glu Ser Lys Th^Val Glu Lys Asn Glu Gin 



1055 



Asp Ala Thr GluThr Thr Ala Gin Asn Arg Glu Val Ala Lys Glu Ala 

1065 1070 
L ys ser Asn Val Lys Ala Asn Thr Gin Thr Asn Glu Val Ala Gin Ser 

1080 1085 

Gly Ser Glu Thr Lys Glu Thr Gin Thr Thr Glu Thr Lys Glu Thr Ala 

1095 1100 

H05 Val ^ G1U Ma *>y* Val Glu Thr Glu Lys Thr Gin 

1115 1120 
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Glu Val Pro Lys Val Thr Ser Gin Val Ser Pro Lys Gin Glu Gin Ser 
1125 H30 1135 

Glu Thr Val Gin Pro Gin Ala Glu Pro Ala Arg Glu Asn Asp Pro Thr 
1140 H45 1150 

Val Asn lie Lys Glu Pro Gin Ser Gin Thr Asn Thr Thr Ala Asp Thr 
1155 1160 H65 

Glu Gin Pro Ala Lys Glu Thr Ser Ser Asn Val Glu Gin Pro Val Thr 
1170 1175 iieo 

Glu Ser Thr Thr Val Asn Thr Gly Asn Ser Val Val Glu Asn Pro Glu 
118 5 1190 H95 1200 

Asn Thr Thr Pro Ala Thr Thr Gin Pro Thr Val Asn Ser Glu Ser Ser 
1205 1210 1215 

Asn Lys Pro Lys Asn Arg His Arg Arg Ser Val Arg Ser Val Pro His 
1220 1225 ^ 1230 

Asn Val Glu Pro Ala Thr Thr Ser Ser Asn Asp Arg Ser Thr Val Ala 
1235 1240 1245 

Leu Cys Asp Leu Thr Ser Thr Asn Thr Asn Ala Val Leu Ser Asp Ala 
1250 1255 1260 

Arg Ala Lys Ala Gin Phe Val Ala Leu Asn Val Gly Lys Ala Val Ser 
1265 1270 1275 1280 

Gin His He Ser Gin Leu Glu Met Asn Asn Glu Gly Gin Tyr Asn Val 
1285 1290 1295 

Trp Val Ser Asn Thr Ser Met Asn Lys Asn Tyr Ser Ser Ser Gin Tyr 
1300 1305 1310 

Arg Arg Phe Ser Ser Lys Ser Thr Gin Thr Gin Leu Gly Trp Asp Gin 
1315 1320 1325 

Thr He Ser Asn Asn Val Gin Leu Gly Gly Val Phe Thr Tyr Val Arq 
1330 1335 1340 

Asn Ser Asn Asn Phe Asp Lys Ala Thr Ser Lys Asn Thr Leu Ala Gin 
1345 1350 1355 1360 

Val Asn Phe Tyr Ser Lys Tyr Tyr Ala Asp Asn His Trp Tyr Leu Gly 
1365 1370 1375 

He Asp Leu Gly Tyr Gly Lys Phe Gin Ser Lys Leu Gin Thr Asn His 
1380 1385 1390 

Asn Ala Lys Phe Ala Arg His Thr Ala Gin Phe Gly Leu Thr Ala Gly 
1395 1400 1405 

Lys Ala Phe Asn Leu Gly Asn Phe Gly He Thr Pro He Val Gly Val 
1410 1415 1420 

Arg Tyr Ser Tyr Leu Ser Asn Ala Asp Phe Ala Leu Asp Gin Ala Arg 
1425 1430 1435 1440 

He Lys Val Asn Pro He Ser Val Lys Thr Ala Phe Ala Gin Val Asp 
1445 1450 1455 

Leu Ser Tyr Thr Tyr His Leu Gly Glu Phe Ser Val Thr Pro He Leu 
1460 1465 1470 
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Ser Ala Arg Tyr Asp Ala Asn Gin Gly Ser Gly Lys He Asn Val Asn 



1485 



VEo** 9 ^ ?f o C Val G1U Asn Gln Gln Gln Asn Ala 

* U 1495 1500 

Gly^Leu Lys Leu Lys Tyr His Asn Val Lys Leu Ser Leu He Gly Gly 

1515 1520 

Leu Thr Lys Ala Lys Gln Ala Glu Lys Gln Lys Thr Ala Glu Leu Lys 
1525 1530 1535 7 

Leu Ser Phe Ser Phe 
1540 

(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 5 amino acids 
(E) TYPE: amino acid 
CD) TOPOLOGY: unknown 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Meu Leu Asn Lys Lys Phe Lys Leu Asn Phe He Ala Leu Thr Val Ala 

10 15 

Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 



30 



Asp Tyr Gln He Phe Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Ser 
« 40 45 

Val Gly Ala Thr Asn Val Glu Val Arg Asp Lys Asn Asn Arg Pro Leu 

55 60 

Gly Asn Val Leu Pro Asn Gly He Pro Met He Asp Phe Ser Val Val 

70 75 80 

Asp Val Asp Lys Arg He Ala Thr Leu Val Asn Pro Gln Tyr Val Val 

85 90 95 

Gly val Lys His Val Ser Asn Gly Val Ser Glu Leu His Phe Gly Asn 

AvU 105 



110 



Leu Asn Gly Asn Met Asn Asn Gly Asn Ala Lys Ala His Arg Asp Val 



125 



Ser Ser Glu Glu Asn Arg Tyr Tyr Thr Val Glu Lys Asn Glu Tyr Pro 

135 140 

Thr Lys Leu Asn Gly Lys Ala Val Thr Thr Glu Asp Gln Ala Gln Lys 

150 155 
Arg Arg Glu Asp Tyr Tyr Met Pro Arg Leu Asp Lys Phe Val Thr Glu 



175 



Val Ala Pro lie Glu Ala Ser Thr Asp Ser Ser Thr Ala Gly Thr Tyr 

185 190 

Asn Asn Lys Asp Lys Tyr Pro Tyr Phe Val Arg Leu Gly Ser Gly Thr 

200 205 
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Gin Phe He Tyr Glu Asn Gly Thr Arg Tyr Glu Leu Trp Leu Gly Lys 
210 215 220 

Glu Gly Gin Lys Ser Asp Ala Gly Gly Tyr Asn Leu Lys Leu Val Gly 
225 2 30 235 240 

Asn Ala Tyr Thr Tyr Gly He Ala Gly Thr Pro Tyr Glu Val Asn His 
2 *5 250 255 

Glu Asn Asp Gly Leu He Gly Phe Gly Asn Ser Asn Asn Glu Tyr He 
260 265 270 

Asn Pro Lys Glu He Leu Ser Lys Lys Pro Leu Thr Asn Tyr Ala Val 
275 260 285 

Leu Gly Asp Ser Gly Ser Pro Leu Phe Val Tyr Asp Arg Glu Lys Glv 
2 *0 295 300 

Lys Trp Leu Phe Leu Gly Ser Tyr Asp Tyr Trp Ala Glv Tvr Asn Lys 
305 31C 320 

Lys Ser Trp Gin Glu Trp Asn He Tyr Lys Pro Glu Phe Ala Glu Lys 
325 330 335 

He Tyr Glu Gin Tyr Ser Ala Gly Ser Leu He Glv Ser Lvs Thr Asd 
340 345 350 

Tyr Ser Trp Ser Ser Asn Gly Lys Thr Ser Thr He Thr Gly Gly Glu 
355 360 365 

Lys Ser Leu Asn Val Asp Leu Ala Asp Gly Lys Asp Lys Pro Asn His 
370 375 380 

Gly Lys Ser Val Thr Phe Glu Gly Ser Gly Thr Leu Thr Leu Asn Asn 
385 390 395 400 

Asn He Asp Gin Gly Ala Gly Gly Leu Phe Phe Glu Gly Asp Tyr Glu 
405 410 415 

Val Lys Gly Thr Ser Asp Asn Thr Thr Trp Lys Gly Ala Gly Val Ser 
420 425 430 

Val Ala Glu Gly Lys Thr Val Thr Trp Lys Val His Asn Pro Gin Tvr 
435 440 445 

Asp Arg Leu Ala Lys He Gly Lys Gly Thr Leu He Val Glu Glv Thr 
450 455 460 

Gly Asp Asn Lys Gly Ser Leu Lys Val Gly Asp Gly Thr Val He Leu 
465 470 475 480 

Lys Gin Gin Thr Asn Gly Ser Gly Gin His Ala Phe Ala Ser Val Glv 
485 490 495 

He Val Ser Gly Arg Ser Thr Leu Val Leu Asn Asp Asp Lys Gin Val 
500 505 sio 

Asp Pro Asn Ser He Tyr Phe Gly Phe Arg Gly Gly Arg Leu Asp Leu 
515 520 525 

Asn Gly Asn Ser Leu Thr Phe Asp His He Arg Asn He Asp Glu Glv 
530 535 540 

Ala Arg Leu Val Asn His Ser Thr Ser Lys His Ser Thr Val Thr He 
545 550 555 560 
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Thr Gly Asp Asn Leu lie Thr Asp Pro Asn Asn Val Ser He Tyr Tyr 
565 570 

Val Lys Pro Leu Glu Asp Asp Asn Pro Tyr Ala He Arg Gin He Lys 

585 590 

Tyr Gly Tyr Gin Leu Tyr Phe Asn Glu Glu Asn Arg Thr Tyr Tyr Ala 
SSS 600 605 

Leu Lys Lys Asp Ala Ser lie Arg Ser Glu Phe Pro Gin Asn Arg Gly 

615 520 

625 ASn Ser Leu Met G1 y Thr Glu Lys Ala Asp Ala 

635 640 
Gin Lys Asn Ala Met Asn His lie Asn Asn Glu Arg Met Asn Gly Phe 

650 655 
Asn Gly Tyr Phe Gly Glu Glu Glu Gly Lys Asn Asn Gly Asn Leu Asn 

665 670 

val Thr Phe Lys Gly Lys Ser Glu Gin Asn Arg Phe Leu Leu Thr Gly 
^ 680 g 3 5 

Gly Thr Asn Leu Asn Gly Asp Leu Asn Val Gin Gin Gly Thr Leu Phe 

69s 700 

Leu Ser Gly Arg Pro Thr Pro His Ala Arg Asp He Ala Gly lie Ser 

715 720 



Ser Thr Lys Lys Asp Ser His Phe Ser Glu Asn Asn Glu Val Val Val 

725 73.0 735 

Glu Asp Asp Trp He Asn Arg Asn Phe Lys Ala Thr Asn He Asn Val 

740 745 750 

Thr Asn Asn Ala Thr Leu Tyr Ser Gly Arg Asn Val Glu Ser He Thr 

760 765 

Ser Asn He Thr Ala Ser Asn Asn Ala Lys Val His He Gly Tyr Lys 

775 780 

Ala Gly Asp Thr Val Cys Val Arg Ser Asp Tyr Thr Gly Tyr Val Thr 

" u 795 



800 



Cys Thr Thr Asp Lys Leu Ser Asp Lys Ala Leu Asn Ser Phe Asn Pro 

805 810 815 

Thr Asn Leu Arg Gly Asn Val Asn Leu Thr Glu Ser Ala Asn Phe Val 

825 830 

Leu Gly Lys Ala Asn Leu Phe Gly Thr lie Gin Ser Arg Gly Asn Ser 

840 845 

Gin Val Arg Leu Thr Glu Asn Ser His Trp His Leu Thr Gly Asn Ser, 

855 8€0 

Asp Val His Gin Leu Asp Leu Ala Asn Gly His He His Leu Asn Ser 



880 



Ala Asp Asn Ser Asn Asn Val Thr Lys Tyr Asn Thr Leu Thr Val Asn 
885 890 895 

Ser Leu Ser Gly Asn Gly Ser Phe Tyr Tyr Leu Thr Asp Leu Ser Asn 

905 910 
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Lys Gin Gly Asp Lys Val Val Val Thr Lys Ser Ala Thr Gly Asn Phe 
915 920 925 

Thr Leu Gin Val Ala Asp Lys Thr Gly Glu Pro Asn His Asn Glu Leu 
930 935 940 

Thr Leu Phe Asp Ala Ser Lys Ala Gin Arg Asp His Leu Asn Val Ser 
9 45 950 955 960 

Leu Val Gly Asn Thr Val Asp Leu Gly Ala Trp Lys Tyr Lys Leu Arg 
965 970 * 975 

Asn Val Asn Gly Arg Tyr Asp Leu Tyr Asn Pro Glu Val Glu Lys Arg 
980 985 990 

Asn Gin Thr Val Asp Thr Thr Asn lie Thr Thr Pro Asn Asn He Gin 
995 1000 1005 

Ala Asp Val Pro Ser Val Pro Ser Asn Asn Glu Glu He Ala Arc Val 
1010 1015 102C 

Asp Glu Ala Pro Val Pro Pro Pro Ala Pro Ala Thr Pro Ser Glu Thr 
1°25 1030 1035 1040 

Thr Glu Thr Val Ala Glu Asn Ser Lvs Gin Glu Ser Lys Thr Val Glu 
1045 1050 1055 

Lys Asn Glu Gin Asp Ala Thr Glu Thr Thr Ala Gin Asn Arg Glu Val 
1060 1065 1070 

Ala Lys Glu Ala Lys Ser Asn Val Lys Ala Asn Thr Gin Thr Asn Glu 
1075 1080 1085 

Val Ala Gin Ser Gly Ser Glu Thr Lys Glu Thr Gin Thr Thr Glu Thr 
1090 1095 1100 

Lys Glu Thr Ala Thr Val Glu Lys Glu Glu Lys Ala Lys Val Glu Thr 
1105 IHO 1115 1120 

Glu Lys Thr Gin Glu Val Pro Lys Val Thr Ser Gin Val Ser Pro Lys 
1125 H30 H35 

Gin Glu Gin Ser Glu Thr Val Gin Pro Gin Ala Glu Pro Ala Arg Glu 
1140 H45 1150 

Asn Asp Pro Thr Val Asn He Lys Glu Pro Gin Ser Gin Thr Asn Thr 
1155 H60 H65 

Thr Ala Asp Thr Glu Gin Pro Ala Lys Glu Thr Ser Ser Asn Val Glu 
1170 1175 1180 

Gin Pro Val Thr Glu Ser Thr Thr Val Asn Thr Gly Asn Ser Val Val 
1185 1190 1195 1200 

Glu Asn Pro Glu Asn Thr Thr Pro Ala Thr Thr Gin Pro Thr Val Asn 
1205 1210 1215 

Ser Glu Ser Ser Asn Lys Pro Lys Asn Arg His Arg Arg Ser Val Arg 
1220 1225 1230 

Ser Val Pro His Asn Val Glu Pro Ala Thr Thr Ser Ser Asn Asp Arg 
1235 1240 1245 

Ser Thr Val Ala Leu Cys Asp Leu Thr Ser Thr Asn Thr Asn Ala Val 
1250 1255 1260 



WO 96/05958 



PC1YUS95/10661 



-68- 

Leu^er Asp Ala Arg Ala Lys Ala Gin Phe Val Ala Leu Asn Val Gly 

1275 1280 
Lys Ala Val Ser GlnHis lie Ser Gin Met Asn Asn Glu Gly 

Gin Tyr Asn Va^Trp Val Ser Asn Th^Ser Met Asn Lys Asr^Tyr Ser 

Ser Ser Gin Tyr Arg Arg Phe Ser Ser Lys Ser Thr Gin Thr Gin Leu 

1320 132 5 

Gly Trp Asp Gin Thr lie Ser Asn Asn Val Gin Leu Gly Gly Val Phe 

i335 1340 

Th^Tyr Val Arg Asn Se^Asn Asn Phe Asp Lys Ala Thr Ser Lys Asn 

1355 1360 

Thr Leu Ala Gin Va^Asn Phe Tyr Ser Ly^Tyr Tyr Ala Asp Asn His 



■13' 



Trp Tyr Leu Gly He Asp Leu Gly Tyr Gly Lys Phe Gin Ser Lys Leu 

1385 139Q 

Gin Thr Asn His Asn Ala Lys Phe Ala Arg His Thr Ala Gin Phe Gly 

1400 1405 

Hlo* 1 * ^ Ala JSs*" LCU ^ «» «y He Thr Pro 

1415 1420 

SI 5 Val GlY ^ TTf 0 Ser ^r Asn Ala Asp Phe Ala Leu 

1435 1440 
Asp Gin Ala Arg ll^Lys Val Asn Pro !le Ser Val Lys Thr Ala Phe 

1450 1455 

Ala Gin Val Asp Leu Ser Tyr Thr Tyr His Le U G ly Glu Phe Ser Val 

1465 1470 
Thr Pro n^Leu Ser Ala Arg Ty^Asp Ala Asn Gin Gl^Ser Gly Lys 

He Asn Val Asn Gly Tyr Asp Phe Ala Tyr Asn Val Glu Asn Gin Gin 

* 4 * 5 1500 

150S Tyr AS " Gly f^** W "is Asn Val Lys Leu Ser 

1515 1520 
Leu He Gly Gly Leu Thr Lys Ala Lys Gin Ala Glu Lys Gin Lys Thr 

1530 1535 

Ala Glu Leu Lys Leu Ser Phe Ser Phe 
1540 1545 

(2) INFORMATION FOR SEQ ID NO: 5: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1702 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Met Leu Asn Lys Lys Phe Lys Leu Asn Phe He Ala Leu Thr Val Ala 



15 
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Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 
20 25 30 

Asp Tyr Gin lie Phe Arg Asp Phe Ala Glu Asn Lys Gly Arg Phe Ser 
35 40 45 

Val Gly Ala Thr Asn Val Glu Val Arg Asp Lys Asn Asn His Ser Leu 
50 55 60 

Gly Asn Val Leu Pro Asn Gly lie Pro Met lie Asp Phe Ser Val Val 
65 70 75 80 

Asp Val Asp Lys Arg lie Ala Thr Leu He Asn Pro Gin Tyr Val Val 
85 90 95 

Gly Val Lys His Val Ser Asn Gly Val Ser Glu Leu His Phe Gly Asn 
100 105 no 

Leu Asn Gly Asn Met Asn Asn Gly Asn Asp Lvs Ser His Arg Asp Val 
115 120 121 

Ser Ser Glu Glu Asn Arg Tyr Phe Ser Val Glu Lys Asn Glu Tyr Pro 
130 135 140 

Thr Lys Leu Asn Gly Lys Ala Val Thr Thr Glu Asd Gin Thr Gin Lvs 
145 150 155 160 

Arg Arg Glu Asp Tyr Tyr Met Pro Arg Leu Asp Lys Phe Val Thr Glu 
165 170 175 

Val Ala Pro He Glu Ala Ser Thr Ala Ser Ser Asp Ala Gly Thr Tyr 
180 185 190 

Asn Asp Gin Asn Lys Tyr Pro Ala Phe Val Arg Leu Gly Ser Gly Thr 
195 200 205 

Gin Phe He Tyr Lys Lys Gly Asp Asn Tyr Ser Leu He Leu Asn Asn 
210 215 220 

His Glu Val Gly Gly Asn Asn Leu Lys Leu Val Gly Asp Ala Tyr Thr 
225 230 235 240 

Tyr Gly He Ala Gly Thr Pro Tyr Lys Val Asn His Glu Asn Asn Gly 
245 250 255 

Leu He Gly Phe Gly Asn Ser Lys Glu Glu His Ser Asp Pro Lys Gly 
260 265 270 

He Leu Ser Gin Asp Pro Leu Thr Asn Tyr Ala Val Leu Gly Asp Ser 
275 280 285 

Gly Ser Pro Leu Phe Val Tyr Asp Arg Glu Lys Gly Lys Trp Leu Phe 
290 295 300 

Leu Gly Ser Tyr Asp Phe Trp Ala Gly Tyr Asn Lys Lys Ser Trp Gin 
305 310 315 320 

Glu Trp Asn He Tyr Lys Pro Glu Phe Ala Lys Thr Val Leu Asp Lys 
325 330 335 

Asp Thr Ala Gly Ser Leu He Gly Ser Asn Thr Gin Tyr Asn Trp Asn 
340 345 350 

Pro Thr Gly Lys Thr Ser Val He Ser Asn Gly Ser Glu Ser Leu Asn 
355 360 365 
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Val Asp Leu Phe Asp Ser Ser Gin Asp Thr Asp Ser Lys Lys Asn Asn 



380 



His Gly Lys Ser Val Thr Leu Arg Gly Ser Gly Thr Leu Thr Leu Asn 
385 390 395 400 

Asn Asn He Asp Gin Gly Ala Gly Gly Leu Phe Phe Glu Gly Asp Tyr 
405 410 415 

Glu Val Lys Gly Thr Ser Asp Ser Thr Thr Trp Lys Gly Ala Gly Val 
420 425 430 

Ser val Ala Asp Gly Lys Thr Val Thr Trp Lys Val His Asn Pro Lys 
4 ^5 440 445 7 

Ser Asp Arg Leu Ala Lys lie Gly Lys Gly Thr Leu He Val Glu Gly 



460 



Lys Gly Glu Asn Lys Gly Ser Leu Lys Val Glv Aso Gly Thr Val lie 
465 470 475 4BC 

Leu Lys Gin Gin Ala Asp Ala Asn Asn Lys Val Lys Ala Phe Ser Gin 
485 490 495 

Val Gly He Val Ser Gly Arg Ser Thr Val Val Leu Asn Asp Asp Lys 
500 505 sxo 

Gin Val Asp Pro Asn Ser He Tyr Phe Gly Phe Arg Gly Gly Arg Leu 
51 ? 520 525 

Asp Ala Asn Gly Asn Asn Leu Thr Phe Glu. His He Arg Asn He Asp 
530 535 540 

Asp Gly Ala Arg Leu Val Asn His Asn Thr Ser Lys Thr Ser Thr Val 
54 5 550 --- 



555 560 



Thr He Thr Gly Glu Ser Leu He Thr Asp Pro Asn Thr He Thr Pro 
565 570 575 

Tyr Asn He Asp Ala Pro Asp Glu Asp Asn Pro Tyr Ala Phe Arg Arg 
580 585 590 

lie Lys Asp Gly Gly Gin Leu Tyr Leu Asn Leu Glu Asn Tyr Thr Tyr 
595 600 60S 

Tyr Ala Leu Arg Lys Gly Ala Ser Thr Arg Ser Glu Leu Pro Lys Asn 
610 615 620 

Ser Gly Glu Ser Asn Glu Asn Trp Leu Tyr Met Gly Lys Thr Ser Asp 
625 630 635 ' 640 

Ala Ala Lys Arg Asn Val Met Asn His He Asn Asn Glu Arg Met Asn 
645 650 655 

Gly Phe Asn Gly Tyr Phe Gly Glu Glu Glu Gly Lys Asn Asn Glv Asn 
660 665 670 

Leu Asn Val Thr Phe Lys Gly Lys Ser Glu Gin Asn Arg Phe Leu Leu 
675 680 685 

Thr Gly Gly Thr Asn Leu Asn Gly Asp Leu Lys Val Glu Lys Gly Thr 
690 695 700 

Leu Phe Leu Ser Gly Arg Pro Thr Pro His Ala Arg Asp He Ala Gly 
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He Ser Ser Thr Lys Lys Asp Gin His Phe Ala Glu Asn Asn Glu Val 
725 730 735 

Val Val Glu Asp Asp Trp He Asn Arg Asn Phe Lys Ala Thr Asn He 
740 745 750 

Asn Val Thr Asn Asn Ala Thr Leu Tyr Ser Gly Arg Asn Val Ala Asn 
755 760 765 

He Thr Ser Asn He Thr Ala Ser Asp Asn Ala Lys Val His He Gly 
770 775 780 

Tyr Lys Ala Gly Asp Thr Val Cys Val Arg Ser Asp Tyr Thr Gly Tyr 
785 790 795 800 

Val Thr Cys Thr Thr Asp Lys Leu Ser Asp Lys Ala Leu Asn Ser Phe 
805 810 815 

Asn Ala Thr Asn Val Ser Gly Asn Val Asn Leu Ser Glv Asn Ala Asn 
820 625 * 830 

Phe Val Leu Gly Lys Ala Asn Leu Phe Gly Thr He Ser Gly Thr Gly 
835 840 845 

Asn Ser Gin Val Arg Leu Thr Glu Asn Ser His Trp His Leu Thr Gly 
850 855 860 

Asp Ser Asn Val Asn Gin Leu Asn Leu Asp Lys Gly His He His Leu 
865 870 875 880 

Asn Ala Gin Asn Asp Ala Asn Lys Val Thr Thr Tyr Asn Thr Leu Thr 
885 890 895 

Val Asn Ser Leu Ser Gly Asn Gly Ser Phe Tyr Tyr Leu Thr Asp Leu 
900 905 910 

Ser Asn Lys Gin Gly Asp Lys Val Val Val Thr Lys Ser Ala Thr Gly 
915 920 925 

Asn Phe Thr Leu Gin Val Ala Asp Lys Thr Gly Glu Pro Thr Lys Asn 
930 935 940 

Glu Leu Thr Leu Phe Asp Ala Ser Asn Ala Thr Arg Asn Asn Leu Asn 
945 950 955 960 

Val Ser Leu Val Gly Asn Thr Val Asp Leu Gly Ala Trp Lys Tyr Lys 
965 970 975 

Leu Arg Asn Val Asn Gly Arg Tyr Asp Leu Tyr Asn Pro Glu Val Glu 
980 985 990 

Lys Arg Asn Gin Thr Val Asp Thr Thr Asn He Thr Thr Pro Asn Asn 
995 1000 1005 

He Gin Ala Asp Val Pro Ser Val Pro Ser Asn Asn Glu Glu He Ala 
1010 1015 1020 

Arg Val Glu Thr Pro Val Pro Pro Pro Ala Pro Ala Thr Pro Ser Glu 
1025 1030 1035 1040 

Thr Thr Glu Thr Val Ala Glu Asn Ser Lys Gin Glu Ser Lys Thr Val 
1045 1050 10S5 

Glu Lys Asn Glu Gin Asp Ala Thr Glu Thr Thr Ala Gin Asn Gly Glu 
1060 1065 1070 
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Val Ala Glu Glu Ala Lys p ro S er Val Lys Ala Asn Thr Gin Thr Asn 
W/ * 1080 10 85 

G1U So" 8 Gly ^^ 5 G1U Thr Glu Glu JJ^ln Thr Thr Glu 

Ile^ys Glu Thr Ala Lys Val Glu Lys Glu Glu Lys Ala Lys Val Glu 

1115 1120 
Lys Glu Glu Lys Al^Lys Val Glu Lys Asp^Glu He Gin Glu Ala Pro 

Gin Met Ala SerGlu Thr Ser Pro Ly^Gln Ala Lys Pro Ala Pro Lys 



1150 



Glu val ser Thr Asp Thr Lys Val Glu Glu Thr Gin Val Gin Ala Gin 

1160 1165 

Pro Gin T hr Gin Ser Thr Thr Val Ala Ala Ala Glu Ala Thr Ser Pro 

1175 1180 
As^ser Lys Pro Ala Glu Glu Thr Gin Pro Ser Glu Lys Thr Asn Ala 

1195 1200 
Glu Pro Val Thr Pr^Val Val Ser Lys Asn Gin Thr Glu Asn Thr Thr 

1210 1215 

Asp Gin Pro Thr Q Glu Arg Glu Lys ThrAla Lys Val Glu ThrGlu Lys 

Thr Gin Gl^Pro Pro Gin Val Al^Ser Gin Ala Ser Pr^Lys Gin Glu 

Gin Se^Glu Thr Val Gin Pr^Gln Ala Val Leu Glu^Ser Glu Asn Val 

Pr^Thr val Asn Asn Ala Glu Glu Val Gin Ala Gin Leu Gin Thr Gin 

1275 1280 
Thr Ser Ala Thr Valuer Thr Lys Gin Previa Pro Glu Asn Ser He 

Asn Thr Gly Ser Ala Thr Ala lie Thr Glu Thr Ala Glu Lys Ser Asp 

1305 1310 
Lys Pro Glnjhr Glu Thr Ala Ala Ser Thr Glu Asp Ala Ser Gin His 

xj^u 1325 
Lys Ala Q Asn Thr Val Ala As^Asn Ser Val Ala As^Asn Ser Glu Ser 

Se^Glu Pro Lys Ser Arg Arg Arg Arg Ser He Ser Gin Pro Gin Glu 

1355 1360 
Thr ser Ala Glu GluThr Thr Ala Ala SerThr Asp Glu Thr Thr He 

Ala Asp Asn ser Lys Arg Ser Lys Pro Asn Arg Arg Ser Arg A^Ser 

1385 139Q 

Val Arg Se^Glu Pro Thr Val ThrAsn Gly Ser Asp Arg^er Thr Val 
Ala Le^Arg Asp Leu Thr Se^Thr Asn Thr Asn Al^Val He Ser Asp 
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Ala Met Ala Lys Ala Gin Phe Val Ala Leu Asn Val Gly Lys Ala Val 
1425 1430 1435 1440 

Ser Gin His He Ser Gin Leu Glu Met Asn Asn Glu Gly Gin Tyr Asn 
1445 1450 1455 

Val Trp Val Ser Asn Thr Ser Met Asn Glu Asn Tyr Ser Ser Ser Gin 
1460 1465 1470 

Tyr Arg Arg Phe Ser Ser Lys Ser Thr Gin Thr Gin Leu Gly Trp Asp 
1475 1480 1485 

Gin Thr He Ser Asn Asn Val Gin Leu Gly Gly Val Phe Thr Tyr Val 
1490 1495 * 1500 

Arg Asn Ser Asn Asn Phe Asp Lys Ala Ser Ser Lys Asn Thr Leu Ala 
1505 1510 1515 1520 

Gin Val Asn Phe Tyr Ser Lys Tyr Tyr Ala Asp Asn His Trp Tyr Leu 
1525 1530 1535 

Gly He Asp Leu Gly Tyr Gly Lys Phe Gin Ser Asn Leu Lys Thr Asn 
1540 1545 1550 

His Asn Ala Lys Phe Ala Arg His Thr Ala Gin Phe Gly Leu Thr Ala 
1555 1560 1565 

Gly Lys Ala Phe Asn Leu Gly Asn Phe Gly He Thr Pro He Val Gly 
1570 1575 1580 

Val Arg Tyr Ser Tyr Leu Ser Asn Ala Asn Phe Ala Leu Ala Lys Asp 
1585 1590 1595 1600 

Arg He Lys Val Asn Pro He Ser Val Lys Thr Ala Phe Ala Gin Val 
1605 1610 1615 

Asp Leu Ser Tyr Thr Tyr His Leu Gly Glu Phe Ser Val Thr Pro He 
1620 1625 1630 

Leu Ser Ala Arg Tyr Asp Thr Asn Gin Gly Ser Gly Lys He Asn Val 
1635 1640 1645 

Asn Gin Tyr Asp Phe Ala Tyr Asn Val Glu Asn Gin Gin Gin Tyr Asn 
1650 1655 1660 

Ala Gly Leu Lys Leu Lys Tyr His Asn Val Lys Leu Ser Leu He Gly 
1665 1670 1675 1680 

Gly Leu Thr Lys Ala Lys Gin Ala Glu Lys Gin Lys Thr Ala Glu Leu 
1685 1690 " 1695 

Lys Leu Ser Phe Ser Phe 
1700 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 184 B amino acids 
{B) TYPE: amino acid 
(D) TOPOLOGY: unknown 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Leu Asn Lys Lys Phe Lys Leu Asn Phe He Ala Leu Thr Val Ala 
1 5 10 is 
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Tyr Ala Leu Thr Pro Tyr Thr Glu Ala Ala Leu Val Arg Asp Asp Val 
20 25 30 

Asp Tyr Gin He Phe Arg Asp Phe Ala Glu Asn Lys Gly Lys Phe Ser 
35 40 45 

val Gly Ala Thr Asn Val Glu Val Arg Asp Lys Lys Asn Gin Ser Leu 
b ° 55 6 o 

Gly Ser Ala Leu Pro Asn Gly He Pro Met He Asp Phe Ser Val Val 
65 70 75 * 80 

Asp Val Asp Lys Arg He Ala Thr Leu Val Asn Pro Gin Tyr Val Val 
85 90 95 

Gly Val Lys His Val Ser Asn Gly Val Ser Glu Leu His Phe Gly Asn 
100 105 110 

Leu Asn Gly Asn Met Asn Asn Gly Asn Ala Lys Ser His Arg Asp Val 

120 12b 

Ser Ser Glu Glu Asn Arg Tyr Tyr Thr Val Glu Lys Asn Asn Phe Pro 
" u 135 140 

Thr Glu Asn Val Thr Ser Phe Thr Lys Glu Glu Gin Asp Ala Gin Lys 
5 150 155 

Arg Arg Glu Asp Tyr Tyr Met Pro Arg Leu Asp Lys Phe Val Thr Glu 
165 170 175 

Val Ala Pro lie Glu Ala Ser Thr Ala Asn Asn Asn Lys Gly Glu Tyr 

Asn Asn Ser Asp Lys Tyr Pro Ala Phe Val Arg Leu Gly Ser Gly Thr 
195 200 205 

Gin Phe lie Tyr Lys Lys Gly Ser Arg Tyr Gin Leu He Leu Thr Glu 



220 



Lys Asp Lys Gin Gly Asn Leu Leu Arg Asn Trp Asp Val Gly Gly Asp 
225 230 235 ' 240 

Asn Leu Glu Leu Val Gly Asn Ala Tyr Thr Tyr Gly He Ala Gly Thr 
245 250 255 

Pro Tyr Lys Val Asn His Glu Asn Asn Gly Leu lie Gly Phe Gly Asn 
260 265 270 

Ser Lys Glu Glu His Ser Asp Pro Lys Gly He Leu Ser Gin Asp Pro 
275 280 285 

Leu Thr Asn Tyr Ala Val Leu Gly Asp Ser Gly Ser Pro Leu Phe Val 



300 



Tyr Asp Arg Glu Lys Gly Lys Trp Leu Phe Leu Gly Ser Tyr Asp Phe 
05 310 315 320 

Trp Ala Gly Tyr Asn Lys Lys Ser Trp Gin Glu Trp Asn He Tyr Lys 
32S 3 3o 

His Glu Phe Ala Glu Lys He Tyr Gin Gin Tyr Ser Ala Gly Ser Leu 
340 345 350 

He Gly Ser Asn Thr Gin Tyr Thr Trp Gin Ala Thr Gly Ser Thr Ser 
355 360 365 
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Thr lie Thr Gly Gly Gly Glu Pro Leu Ser Val Asp Leu Thr Asp Gly 
370 375 380 

Lys Asp Lys Pro Asn His Gly Lys Ser lie Thr Leu Lys Gly Ser Gly 
385 390 395 400 

Thr Leu Thr Leu Asn Asn His lie Asp Gin Gly Ala Gly Gly Leu Phe 
405 410 415 

Phe Glu Gly Asp Tyr Glu Val Lys Gly Thr Ser Asp Ser Thr Thr Trp 
420 425 430 

Lys Gly Ala Gly Val Ser Val Ala Asp Gly Lys Thr Val Thr Trp Lys 
435 440 445 

Val His Asn Pro Lys Tyr Asp Arg Leu Ala Lys lie Gly Lys Gly Thr 
4 50 4 55 460 

Leu Val Val Glu Gly Lys Gly Lys Asn Glu Glv Leu Leu Lys Val Gly 
465 470 475 480 

Asp Gly Thr Val lie Leu Lys Gin Lys Ala Asp Ala Asn Asn Lys Val 
485 490 495 

Gin Ala Phe Ser Gin Val Gly lie Val Ser Gly Arg Ser Thr Leu Val 
500 505 510 

Leu Asn Asp Asp Lys Gin Val Asp Pro Asn Ser lie Tyr Phe Gly Phe 
515 520 525 

Arg Gly Gly Arg Leu Asp Leu Asn Gly Asn Ser Leu Thr Phe Asp His 
530 535 540 

lie Arg Asn lie Asp Asp Gly Ala Arg Val Val Asn His Asn Met Thr 
545 550 555 560 

Asn Thr Ser Asn lie Thr lie Thr Gly Glu Ser Leu lie Thr Asn Pro 
565 570 575 

Asn Thr lie Thr Ser Tyr Asn lie Glu Ala Gin Asp Asp Asp His Pro 
580 585 590 

Leu Arg lie Arg Ser lie Pro Tyr Arg Gin Leu Tyr Phe Asn Gin Asp 
595 600 605 

Asn Arg Ser Tyr Tyr Thr Leu Lys Lys Gly Ala Ser Thr Arg Ser Glu 
610 615 620 

Leu Pro Gin Asn Ser Gly Glu Ser Asn Glu Asn Trp Leu Tyr Met Gly 
625 * 630 635 640 

Arg Thr Ser Asp Ala Ala Lys Arg Asn Val Met Asn His lie Asn Asn 
645 650 655 

Glu Arg Met Asn Gly Phe Asn Gly Tyr Phe Gly Glu Glu Glu Thr Lys 
660 665 670 

Ala Thr Gin Asn Gly Lys Leu Asn Val Thr Phe Asn Gly Lys Ser Asp 
675 680 685 

Gin Asn Arg Phe Leu Leu Thr Gly Gly Thr Asn Leu Asn Gly Asp Leu 
690 695 700 

Asn Val Glu Lys Gly Thr Leu Phe Leu Ser Gly Arg Pro Thr Pro His 
705 710 715 720 



WO 96/05858 



PCT/US95/10661 



-76- 



Ala Arg Asp He Ala Gly He Ser Ser Thr Lys Lys Asp Pro His Phe 
725 730 735 

Thr Glu Asn Asn Glu Val Val Val Glu Asp Asp Trp He Asn Arg Asn 

74S 750 

Phe Lys Ala Thr Thr Met Asn Val Thr Gly Asn Ala Ser Leu Tyr Ser 

755 760 76s 

Gly Arg Asn Val Ala Asn lie Thr Ser Asn lie Thr Ala Ser Asn Asn 
" u 7? 5 780 

Ala Gin Val His He Gly Tyr Lys Thr Gly Asp Thr Val Cys Val Arg 



800 



Ser Asp Tyr Thr Gly Tyr Val Thr Cys His Asn Ser Asn Leu Ser Glu 
805 810 815 

Lys Ala Leu Asn Ser Phe Asn Pro Thr Asn Leu Arg Gly Asn Val Asn 

82 5 83 0 

Leu Thr Glu Asn Ala Ser Phe Thr Leu Gly Lys Ala Asn Leu Phe Gly 
835 840 845 y 

Thr lie Gin Ser He Gly Thr Ser Gin Val Asn Leu Lys Glu Asn Ser 

8S5 860 

His Trp His Leu Thr Gly Asn Ser Asn Val Asn Gin Leu Asn Leu Thr 

875 880 

Asn Gly His He His Leu Asn Ala Gin Asn Asp Ala Asn Lys Val Thr 
885 890 895 

Thr Tyr Asn Thr Leu Thr Val Asn Ser Leu Ser Gly Asn Gly Ser Phe 

905 910 

Tyr Tyr Trp Val Asp Phe Thr Asn Asn Lys Ser Asn Lys Val Val Val 

920 925 

Asn Lys Ser Ala Thr Gly Asn Phe Thr Leu Gin Val Ala Asp Lys Thr 

935 940 

Gly Glu Pro Asn His Asn Glu Leu Thr Leu Phe Asp Ala Ser Asn Ala 

955 960 

Thr Arg Asn Asn Leu Glu Val Thr Leu Ala Asn Gly Ser Val Asp Arg 
96 5 970 



975 



Gly Ala Trp Lys Tyr Lys Leu Arg Asn Val Asn Gly Arg Tyr Asp Leu 

985 990 

Tyr Asn Pro Glu Val Glu Lys Arg Asn Gin Thr Val Asp Thr Thr Asn 

1000 1005 

He Thr Thr Pro Asn Asp lie Gin Ala Asp Ala Pro Ser Ala Gin Ser 

1015 1020 

Asn Asn Glu Glu He Ala Arg Val Glu Thr Pro Val Pro Pro Pro Ala 

1030 1035 1040 

Pro Ala Thr Glu Ser Ala He Ala Ser Glu Gin Pro Glu Thr Arg Pro 
1045 1050 10 | 5 

Ala Glu Thr Ma Gln Pro Ala Met Glu Glu Thr Asn Thr Ala Asn Ser 

1065 1070 
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Thr Glu Thr Ala Pro Lys Ser Asp Thr Ala Thr Gin Thr Glu Asn Pro 
1075 1080 1085 

Asn Ser Glu Ser Val Pro Ser Glu Thr Thr Glu Lys Val Ala Glu Asn 
1090 1095 1100 

Pro Pro Gin Glu Asn Glu Thr Val Ala Lys Asn Glu Gin Glu Ala Thr 
1105 1110 1115 1120 

Glu Pro Thr Pro Gin Asn Gly Glu Val Ala Lys Glu Asp Gin Pro Thr 
1125 1130 H35 

Val Glu Ala Asn Thr Gin Thr Asn Glu Ala Thr Gin Ser Glu Gly Lys 
1140 1145 1150 

Thr Glu Glu Thr Gin Thr Ala Glu Thr Lys Ser Glu Pro Thr Glu Ser 
1155 1160 1165 

Val Thr Val Ser Glu Asn Gin Pro Glu Lvs Thr Val Ser Gin Ser Thr 
1170 1175 1ISC 

Glu Asp Lys Val Val Val Glu Lys Glu Glu Lys Ala Lys Val Glu Thr 
1185 1190 1195 * 1200 

Glu Glu Thr Gin Lys Ala Pro Gin Val Thr Ser Lys Glu Pro Pro Lys 
1205 1210 1215 

Gin Ala Glu Pro Ala Pro Glu Glu Val Pro Thr Asp Thr Asn Ala Glu 
1220 1225 1230 

Glu Ala Gin Ala Leu Gin Gin Thr Gin Pro Thr Thr Val Ala Ala Ala 
1235 1240 1245 

Glu Thr Thr Ser Pro Asn Ser Lys Pro Ala Glu Glu Thr Gin Gin Pro 
1250 1255 1260 

Ser Glu Lys Thr Asn Ala Glu Pro Val Thr Pro Val Val Ser Glu Asn 
1265 1270 1275 1280 

Thr Ala Thr Gin Pro Thr Glu Thr Glu Glu Thr Ala Lys Val Glu Lys 
1285 1290 1295 

Glu Lys Thr Gin Glu Val Pro Gin Val Ala Ser Gin Glu Ser Pro Lys 
1300 1305 1310 

Gin Glu Gin Pro Ala Ala Lys Pro Gin Ala Gin Thr Lys Pro Gin Ala 
1315 1320 1325 

Glu Pro Ala Arg Glu Asn Val Leu Thr Thr Lys Asn Val Gly Glu Pro 
1330 1335 1340 

Gin Pro Gin Ala Gin Pro Gin Thr Gin Ser Thr Ala Val Pro Thr Thr 
1345 1350 1355 1360 

Gly Glu Thr Ala Ala Asn Ser Lys Pro Ala Ala Lys Pro Gin Ala Gin 
1365 1370 1375 

Ala Lys Pro Gin Thr Glu Pro Ala Arg Glu Asn Val Ser Thr Val Asn 
1380 1385 1390 

Thr Lys Glu Pro Gin Ser Gin Thr Ser Ala Thr Val Ser Thr Glu Gin 
1395 1400 1405 

Pro Ala Lys Glu Thr Ser Ser Asn Val Glu Gin Pro Ala Pro Glu Asn 
1410 1415 1420 
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Serne Asn Thr Gly Ser Ala Thr Thr Met Thr Glu Thr Ala Glu Lys 

1435 1440 
Ser Asp Lys Pro Gl^Met Glu Thr Val Thr^Glu Asn Asp Arg Gl^Pro 

Glu Ala Asn Th^Val Ala Asp Asn SerVal Ala Asn Asn Jer^lTser 

Ser Glu SerLys Ser Arg Arg Arg Arg Ser Val Ser Gin Pro Lys Glu 

1480 14Q5 

Thr Serbia Glu Glu Thr Thr Val Ala Ser Thr Gin Glu Thr Thr Val 

1495 1500 

libs* 8 " ir i0 PT ° ^ Pr ° Ar * -* *** **9 Thr Arg Arg 

1515 1520 
Ser val Gin Thr As^Ser Ty r GXu Pro Va^Glu Leu P ro Thr Glu^Asn 

Ala Glu Asn Al^Glu Asn Val Gin Se^Gly Asn Asn Val MaL^Ser 

Gin Pro Al^Leu Arg Asn Leu Thr Ser Lys Asn Thr Asn AiTval lie 

1560 1565 
Ser As^Ala Met Ala Lys Al^Gln Phe Val Ala LeuAsn Val Gly Lys 

AiTval Ser Gin His Il^Ser Gin Leu Glu Met Asn Asn Glu Gly Gin 

1595 1600 



Tyr Asn val Trp ll^Ser Asn Thr Ser Me^Asn Lys Asn Tyr Server 

Glu Gin Tyr Ar^Arg Phe Ser Ser Lys Ser Thr Gin Thr Gin L^ly 

1€25 1630 
Trp Asp Gl^Thr He Ser Asn AsnVal Gin Leu Gly Gly Val Phe Thr 



1645 



Tyr Va^Arg Asn Ser Asn Asn Phe Asp Lys Ala Ser Ser Lys Asn Thr 

1660 

Tes^ V ^ »g 0 ^ S « «*• ^r Tyr Ala Asp Asn His Trp 



1675 1680 
Tyr Leu Gly He As^Leu Gly Tyr Gly Lys Phe Gin Ser Asn Leu Gin 

1690 1695 

Thr Asn Asn As^Ala Lys Phe Ala Arg His Thr Ala Gin lie Gly Leu 

1705 1710 

Thr Ala Gly Lys Ala Phe Asn Leu Gly Asn Phe A l a t 

1715 i7->n Ala Val Lys Pro Thr 

1720 1725 

Val Glyval Arg Tyr Ser Tyr^eu Ser Asn Ala Asp^he Ala Leu Ala 

Gln s Asp Arg lie Lys Val^Asn Pro lie Ser Val Lys Thr Ala Phe Ala 

1755 1760 
Gin Val Asp Leu Ser^Tyr Thr Tyr His Leu Gly Glu Phe Ser He Thr 

1770 1775 
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Pro lie Leu Ser Ala Arg Tyr Asp Ala Asn Gin Gly Asn Gly Lys lie 
1780 1785 1790 

Asn Val Ser Val Tyr Asp Phe Ala Tyr Asn Val Glu Asn Gin Gin Gin 
1795 1800 1805 

Tyr Asn Ala Gly Leu Lys Leu Lys Tyr His Asn Val Lys Leu Ser Leu 
1810 1815 1820 

lie Gly Gly Leu Thr Lys Ala Lys Gin Ala Glu Lys Gin Lys Thr Ala 
1825 1830 1835 1840 

Glu Val Lys Leu Ser Phe Ser Phe 
1845 

(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

Gly Asp Ser Gly Ser Pro Met Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Gly Asp Ser Gly Ser Pro Leu Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

His Thr Tyr Phe Gly He Asp 
1 5 
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CLAIMS 



1. A recombinant Haemophilus adhesion and penetration 
protein. 

2. A recombinant Haemophilus adhesion and penetration 
protein according to claim i which has a sequence 
homologous to that shown in Figure 6. 

3. A recombinant Haemophilus adhesion and penetration 
protein according to claim 1 which has the seauence 
shown in Figure 6 . 

4. A recombinant nucleic acid encoding an Haemophilus 
adhesion and penetration protein. 

5. The nucleic acid of claim 3 comprising DNA having 
a sequence homologous to that shown in Figure 6. 

6. An expression vector comprising transcriptional and 
translational regulatory nucleic acid operably linked 
to nucleic acid encoding an Haemophilus adhesion and 
penetration protein. 

7. A host cell transformed with an expression vector 
comprising a nucleic acid encoding an Haemophilus 
adhesion and penetration protein. 

8. A method of producing an Haemophilus adhesion and 
penetration protein comprising: 

a) culturing a host cell transformed with an 
expressing vector comprising a nucleic acid 
encoding an Haemophilus adhesion and penetration 
protein; and 
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b) expressing said nucleic acid to produce an 
Haemophilus adhesion and penetration protein. 

9. A vaccine comprising a pharmaceutically acceptable 
carrier and an Haemophilus adhesion and penetration 

5 protein for prophylactic or therapeutic use in 

generating an immune response. 

10. A vaccine according to claim 8 wherein said 
Haemophilus adhesion and penetration protein has a 
sequence homologous to that shown in Figure 6. 

10 11. A monoclonal antibody capable of binding to an 

Haemophilus adhesion and penetration protein. 

12. A method of treating or preventing Haemophilus 
influenzae infection comprising administering the 
vaccine of claim 9 or 10. 
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TCAATACTCI^MCTACTATTt^ 

-35 - 10 1 : 

... 130 150 170 



E_ 

190 



210 230 250 270 



GCCCACAA^MGAAGTTC^ 

*tt 310 330 350 

GCCCCGATGATTGATTrTTCTGTAGTGTCACGTAACGGCCTCGCACCCTTGGTTGAA 

APMIDFSVVSRNCVAALVtliqTiv^TAniiT 

__ ft 390 419 430 450 

GGATATA?rGATGTTGATTTTGGTGCAGAGGGAAACAACCCCGATCAACATCGTTT^ 

CYTOVDFCAECMNPOQHRFTYKIVKRHWTR 
470 490 510 530 

™ sto 590 610 630 

AATATGAA^GCAGTACTTATTCAGATAGAACAAAATATCCAGAACGTCTTCGT^ 

^ €70 6^0 710 

GACAAAGGCGACCAAGTTGCCGGTCCATATCATTATCTGACAGCTGGCAATACACACAATCAGCGTGGAGCA 
DKGOQVAGAYHYlTAGNTHNqKtiAW n u t * i 

_ 770 790 810 

TT CGGA GGCGAT CTT C CT AAA 6CG G6 ACAAT AT G GT CCATT ACC G ATT 6 C AG GCT C A AA G G G ^ AC * **J^j A ^ 

LGGOVRKAGE YGPtPI AGSKGOS G S PMF IY 

83© 850 870 890 

GATGCTGAAAAACAAAAATGGTTAATTAATGCGATATTACGGGAAGGCAACCCTTTTGAA 

OAEKQKWLlNGItREGNPFEGKENCFQLVR 



AAAT CTT ATTTT G AT G A A ATTTT C G A A A6AG ATTT A CAT AC AT C ACTTT AC A C C C GA G CTC **J* ** A y ^ GT AC AC AATT A GT GGA A AT 
KSYFOEIFEROLHTSLYTRAGNGVYT 

1010 1030 1«50 1070 

GATAATGCTCAGCCGTCTATAAaCAGAAATCAGGAA^^ 

ONCQGSITQKSGXPSEIKlTLANMSLPLKt 

1ftQa ^0 1130 1150 1170 

AAG GAT AAAGTTC ATAAT CCT AGAT ATGACG GAC CT AAT ATTTATTCT C CAC CTTT A AA CAATGGAGA^CGCT ATATTTT ATGG AT C A A 
KOKVHNPRYDGPHlYSPRLHHtot i L t r « u v 

iiqo 1210 1230 1250 

AAA CAAGGATCATTAATC^CGCATCTGACATTAACCAACGGGCGGGTGGT CTTT ATTT ^ 

ICQGSLIFASDIMQGAGGLYFEGNFTVSPNb 

,™ 12 Qe 1310 1330 13S0 

AACCAAACTTGGCAAGGAGCTGGCATACATGTAAGTGAAAATAGCACCGTTACTTGGAAAGTAAATGGCGTCGAA^ 
NQTWQGAGIHVSENSTVTWKVNGvtHDKLS 

.„„ 1390 1410 1*30 

AAAATTGGTAAACGAAC^TGCACGTTCAAGCCAAACGGGAAAATAAAGCTTCGATCAGCGT 

KIGKGTLHVQAK6EMK6SISVG0GKVILEQ 
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?C CCAC A CC A TC AA CCCAAC r c M CCCTTTA^^ 

1630 1650 1670 1Ma 

WCWCCfiCCCAATWTTCTSAACUTWTACAACTCAACCCCCTjWTCTCACTATTACTW 

1730 1750 1770 19aA 

«T r ATTAA,..A^JAT,««A r c^„«CTAC«C^^ 

1810 1830 1350 ____ 

A>CCTTATTTATAAACCAACCACAGAA«TC CT ACT^^ 

1910 3530 iqca 

^^CACCCCTACACCCACACCCCACC^ 

1990 2010 2030 

.tt«t«, M jatc.«„,«.,ca,« CT «.,x,aaa ; ^ 

2090 2110 2130 ca 

TmCAATT„ e « r ™«A vCT „«CA.,..T ; CA^ 

2178 2190 2210 

TCA^TTCCACACCATTAACCAaTCTCAAAAA^ 

2270 2290 2310 „ M 

tctattaatttaactcataatccaaccccgaatcttaaacct^ 

2350 2370 2390 5l1ft 

2450 2470 2490 

CTCCATTTAACCCATTCACaCAATTTTC^ 

^TJ^CmCACA^ 

cctacctcaaacaatJ^cacctcccccttcattaPacc^ 

2710 2730 2750 ■ , 

AATJCTAAA^GACTCGCCAACCCACAnCCAATTTACTTCATCTTTA^ 

2810 2830 2850 

CACCGCCATTACATATTATCTCTTCCCAACACACCCAAACAACCCCAA^ 
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, BQft 2930 2950 2970 

2990 3010 3030 3050 

ttccgcttgcataaccouwaaaagagcagcaattccacaatgattc 

FRLHHPI K E Q E L H N D I V R A E Q A E RT L . t * A K Q 

wo 3090 3U0 3130 3150 

CTTGAAttCACTGCTAAAACACAAACAGCTCAG^^ 

VEPTAKTQTCE PKVR5RRAARAAF POT LPD 

3170 3190 3210 3230 

CAAA GC CT 6TT AAA C G CATT A CAA G C CAA A C AA G CT C AA CT G A CT G CT G AA A CA CAA AAAAGTA AG G CA A AA ACA AAAAA A GT GC G GT C A 
QSLLMA IEAKQA ELTAETQKSKAICTKKVR5 

3250 3270 3290 3310 3330 

AAAAGAGC^TGTTTTCTGATCCCCTGCTTGATCAAAGCCTCTTCGCATTAGAAGCCCCACTTGACGT^ 

KR A V FSDPLLOQSLFALEAALEVIOAPQQS 

33S0 3370 3390 3410 

GAAAAAGATCGTCTAGCTCAAGAAGAAGCGGAAAAACAACGCAAACAAAAACACTTGATCAGCCCnTATTCAAATAGTGCGTTATCACAA 
£K D R L AQEEAEKQRKQKOLISRYSNSALSE 

3430 3450 3470 3490 3S10 

jjajctGCAACAGTAAATACTATGCTTTCTGTTCAAGATGAATTAGATCGTCTTTTTGTAGATCAAGCACAATCTGCCGTGTGGACAAAT 
LSATVNSMISVQDELDRIFVDQAQSAVWTN 



3530 3550 3570 

ATCCCACAGGATAAAAGACGCTATGATTCTGATGCGTTCCGTGCTTATCAGCAGCAGAAAACGAACTTACCT 
IAQDKRRYOSDA F RAYQQQKTML RQIGVQK 

3610 3630 3650 3670 3690 

cccttagctaatggacgaattggggcagttttctcgcatagccgttcagataatacctttgatgaacaggttaaaaatcacgc^ 

ALANGRI GAVFSHSRS DKTFOEQVKMHATL 

3710 3730 3750 3770 

XCGAT6ATGTCGGGTTTTGCCCAATATCAATGGGGCGATTTACAATTTGGTGTAAACGTGGGAACGGGAATCAGTCCGAGTAAAATGGCT 
TMMSGFAQYQWGDLQF GVNVGTGISAS KMA 

3790 3810 3830 3850 3870 

GAAGAACAAAGCCGAAAAATTCATCGAAAAGCGATAAATTATCGCGTGAATGCAAGTTATCAGTTCCGTTTAGGGCAATTGGGCATTCAG 
EEQSRKIHRKAINYGVMASYQFRLGQLGIQ 

3890 3910 3930 3950 

CCTTATTTTGGAGTTAATCGCTATTTTATTGAACGTGAAAATTATCAATCTGAGGAAGT6AGAGTGAAAACGCCTACCCT 
PYFGVNRYFI ERENYQSEEVRVKTPSLAFN 
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